Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Michael Montalbo <mmontalbo@gmail.com>
Subject: Re: [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers
Date: Fri, 22 May 2026 14:29:03 +0900	[thread overview]
Message-ID: <xmqq8q9cui5c.fsf@gitster.g> (raw)
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com> (Michael Montalbo via GitGitGadget's message of "Fri, 22 May 2026 02:11:19 +0000")

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series adds diff.<driver>.process, a long-running subprocess protocol
> that lets external tools provide hunks to git's diff and blame pipelines.
>
> Over the past 18 years, git's diff pipeline accumulated many features that
> operate on hunks: word diff, function context, color-moved, indent
> heuristic, blame. External tools can replace the pipeline entirely
> (diff.<driver>.command) or select among builtin algorithms
> (diff.<driver>.algorithm), but there is no way for a tool to provide
> line-change information into the pipeline. Tools that understand code
> structure (tree-sitter parsers, format-aware analyzers, tools like
> Difftastic and Mergiraf) must bypass git's pipeline and lose access to
> everything downstream.
>
> The protocol follows filter.<driver>.process: pkt-line over stdin/stdout,
> capability negotiation, one tool invocation per git command. The tool
> receives file pairs and returns hunk descriptors that git feeds into the
> standard xdiff pipeline. All output features work normally.
>
> Zero hunks with status=success means the tool considers the files
> equivalent. git diff shows no output for the file, and git blame skips the
> commit, attributing lines to earlier commits.
>
> On error or tool crash, git falls back silently to the builtin diff
> algorithm. The feature is opt-in via diff.<driver>.process and
> .gitattributes; unconfigured files are unaffected.
>
> The series includes git diff-process-normalize, a built-in tool that
> compares files line by line ignoring whitespace (same logic as "git diff -w"
> via xdiff_compare_lines):

Interesting.

If the goal is purely to normalize content before comparison
(e.g. stripping comments or canonicalizing formatting), we already
have the `textconv` mechanism.  While `textconv` is a "one-shot"
per-file process, it is significantly simpler.

I suspect, however, that the primary focus here is to allow external
tools to provide structural alignment (e.g. for AST- aware diffs
like Difftastic or Mergiraf) without losing the original content in
the display.  Unlike `textconv`, which transforms the text the user
sees, this protocol lets the display remain identical to the source
while using a custom engine for the line-matching logic.

If that is the intent, it should be stated more explicitly in the
documentation and commit messages.  The "whitespace-normalize"
demonstration in [PATCH 5/5] is misleading because it's exactly the
case where `textconv` would be sufficient.

I am afraid that the use of a long-running subprocess for every
diff/blame invocation adds significant complexity and overhead.  In
particular, wouldn't the `blame` implementation performs a
round-trip to the subprocess for every commit in the history?  Even
with a persistent process, the overhead of serializing and
deserializing the entire file content twice (old and new) for every
commit could be prohibitive for large files or deep histories.

So, I dunno.

      parent reply	other threads:[~2026-05-22  5:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22  5:29   ` Junio C Hamano
2026-05-22  2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22  5:29 ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8q9cui5c.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox