git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: git@vger.kernel.org
Cc: ps@pks.im, karthik.188@gmail.com, phillip.wood123@gmail.com,
	Justin Tobler <jltobler@gmail.com>
Subject: [PATCH v3 0/3] batch blob diff generation
Date: Tue, 25 Feb 2025 17:39:22 -0600	[thread overview]
Message-ID: <20250225233925.1345086-1-jltobler@gmail.com> (raw)
In-Reply-To: <20250212041825.2455031-1-jltobler@gmail.com>

Through git-diff(1) it is possible to generate a diff directly between
two blobs. This is particularly useful when the pre-image and post-image
blobs are known and we only care about the diff between them.
Unfortunately, if a user has a batch of known blob pairs to compute
diffs for, there is currently not a way to do so via a single Git
process.

To enable support for batch diffs of multiple blob pairs, this series
introduces a new diff plumbing command git-diff-pairs(1) based on a
previous patch series submitted by Peff[1]. This command uses
NUL-delimited raw diffs as its source of input to control exactly which
filepairs are diffed. The advantage of using the raw diff format is that
it already has diff status type and object context information embedded
in each line making it more efficient to generate diffs with as we can
avoid having to peel revisions to get some the same info.

For example:

    git diff-tree -r -z -M $old $new |
    git diff-pairs -p -z

Here the output of git-diff-tree(1) is fed to git-diff-pairs(1) to
generate the same output that would be expected from `git diff-tree -p
-M`. While by itself not particularly useful, this means it is possible
to split git-diff-tree(1) output across multiple git-diff-pairs(1)
processes. Such a feature is useful on the server-side where diffs
bewteen a large set of changes may not be feasible all at once due to
timeout concerns.

This command can be viewed as a backend tool that exposes Git's diff
machinery. In its current form, the frontend that generates the raw diff
lines used as input is expected to most of the heavy lifting (ie.
pathspec limiting, tree object expansion).

This series is structured as follows:

    - Patch 1 adds some new helper functions to get access to the queued
      `diff_filepair` after `diff_queue()` is invoked.

    - Patch 2 introduces the new git-diff-pairs(1) plumbing command.

    - Patch 3 allows git-diff-pairs(1) to immediately compute diffs
      queued on stdin when a NUL-byte is written after a raw input line
      instead of waiting for stdin to close.

Changes since V2:

    - Pathspecs are not supported and thus rejected when provided as
      arguments. It should be possible in a future series to add support
      though.

    - Tree objects present in `diff-pairs` input are rejected. Support
      for tree objects could be added in the future, but for now they
      are rejected to enable to future support in a backwards compatible
      manner.

    - The -z option is required by git-diff-pairs(1). The NUL-delimited
      raw diff format is the only accepted form of input. Consequently,
      NUL-delimited output is the only option in the `--raw` mode.

    - git-diff-pairs(1) defaults to patch output instead of raw output.
      This better fits the intended usecase of the command.

    - A NUL-byte is now always used as the delimiter between batches of
      file pair diffs when queued diffs are explicitly computed by
      writing a NUL-byte on stdin.

    - Several other small cleanups and fixes along with documentation
      changes.

Changes since V1:

    - Changed from git-diff-blob(1) to git-diff-pairs(1) based on a
      previously submitted series.

    - Instead of each line containing a pair of blob revisions, the raw
      diff format is used as input which already has diff status and
      object context embedded.

-Justin

[1]: <20161201204042.6yslbyrg7l6ghhww@sigill.intra.peff.net>

Justin Tobler (3):
  diff: return diff_filepair from diff queue helpers
  builtin: introduce diff-pairs command
  builtin/diff-pairs: allow explicit diff queue flush

 .gitignore                        |   1 +
 Documentation/git-diff-pairs.adoc |  60 +++++++++
 Documentation/meson.build         |   1 +
 Makefile                          |   1 +
 builtin.h                         |   1 +
 builtin/diff-pairs.c              | 206 ++++++++++++++++++++++++++++++
 command-list.txt                  |   1 +
 diff.c                            |  70 +++++++---
 diff.h                            |  25 ++++
 git.c                             |   1 +
 meson.build                       |   1 +
 t/meson.build                     |   1 +
 t/t4070-diff-pairs.sh             |  83 ++++++++++++
 13 files changed, 432 insertions(+), 20 deletions(-)
 create mode 100644 Documentation/git-diff-pairs.adoc
 create mode 100644 builtin/diff-pairs.c
 create mode 100755 t/t4070-diff-pairs.sh

-- 
2.48.1


  parent reply	other threads:[~2025-02-25 23:42 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13  4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13  4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13  4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13  8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17   ` Junio C Hamano
2024-12-13 10:38     ` Jeff King
2024-12-15  2:07       ` Junio C Hamano
2024-12-15  2:17         ` Junio C Hamano
2024-12-16 11:11           ` Jeff King
2024-12-16 16:29             ` Junio C Hamano
2024-12-18 11:39               ` Jeff King
2024-12-18 14:53                 ` Junio C Hamano
2024-12-20  9:09                   ` Jeff King
2024-12-20  9:10                     ` Jeff King
2024-12-13 16:41   ` Justin Tobler
2024-12-16 11:18     ` Jeff King
2024-12-13 22:34   ` Junio C Hamano
2024-12-15 23:24     ` Junio C Hamano
2024-12-16 11:30       ` Jeff King
2025-02-12  4:18 ` [PATCH v2 " Justin Tobler
2025-02-12  4:18   ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12  9:06     ` Karthik Nayak
2025-02-12 17:35       ` Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12 17:24       ` Justin Tobler
2025-02-13  5:45         ` Patrick Steinhardt
2025-02-12  4:18   ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12  9:51     ` Karthik Nayak
2025-02-25 23:38       ` Justin Tobler
2025-02-12 11:40     ` Jean-Noël Avila
2025-02-12 16:50     ` Junio C Hamano
2025-02-19 22:19       ` Justin Tobler
2025-02-19 23:19         ` Junio C Hamano
2025-02-19 23:47           ` Junio C Hamano
2025-02-20  0:32             ` Justin Tobler
2025-02-20 14:56               ` Justin Tobler
2025-02-20 16:14                 ` Junio C Hamano
2025-02-17 14:38     ` Phillip Wood
2025-02-19 20:51       ` Justin Tobler
2025-02-19 21:57         ` Junio C Hamano
2025-02-19 22:38           ` Justin Tobler
2025-02-26 14:47         ` Phillip Wood
2025-02-12  4:18   ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-17 14:38     ` Phillip Wood
2025-02-19 23:09       ` Justin Tobler
2025-02-25 23:39   ` Justin Tobler [this message]
2025-02-25 23:39     ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04       ` Junio C Hamano
2025-02-25 23:39     ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24       ` Junio C Hamano
2025-02-27 22:15         ` Justin Tobler
2025-02-27  9:35       ` Karthik Nayak
2025-02-27 22:36         ` Justin Tobler
2025-02-27 12:56       ` Patrick Steinhardt
2025-02-27 23:00         ` Justin Tobler
2025-02-25 23:39     ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58     ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04       ` Justin Tobler
2025-02-28  0:26     ` [PATCH v4 0/4] " Justin Tobler
2025-02-28  0:26       ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28  0:26       ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:10           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:26           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33       ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33         ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250225233925.1345086-1-jltobler@gmail.com \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    --cc=phillip.wood123@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).