From: Junio C Hamano <gitster@pobox.com>
To: Justin Tobler <jltobler@gmail.com>
Cc: Jeff King <peff@peff.net>, git@vger.kernel.org, ps@pks.im
Subject: Re: [PATCH 0/3] batch blob diff generation
Date: Sun, 15 Dec 2024 15:24:11 -0800 [thread overview]
Message-ID: <xmqqa5cw8r8k.fsf@gitster.g> (raw)
In-Reply-To: <xmqqldwj9pq0.fsf@gitster.g> (Junio C. Hamano's message of "Fri, 13 Dec 2024 14:34:47 -0800")
Junio C Hamano <gitster@pobox.com> writes:
> Jeff King <peff@peff.net> writes:
>
>> So ideally we'd have an input format that encapsulates that extra
>> context data and provides some mechanism for quoting. And it turns out
>> we do: the --raw diff format.
>
> Funny. The raw diff format indeed was designed as an interchange
> format from various "compare two sets of things" front-ends (like
> diff-files, diff-cache, and diff-tree) that emits the raw format, to
> be read by "diff-helper" (initially called "diff-tree-helper") that
> takes the raw format and
>
> - matches removed and added paths with similar contents to detect
> renames and copies
>
> - computes the output in various formats including "patch".
>
> So I guess we came a full circle, finally ;-). Looking in the archive
> for messages exchanged between junkio@ and torvalds@ mentioning diff
> before 2005-05-30 finds some interesting gems.
>
> https://lore.kernel.org/git/7v1x8zsamn.fsf_-_@assigned-by-dhcp.cox.net/
So, if we were to do what Justin tried to do honoring the overall
design of our diff machinery, I think what we can do is as follows:
* Use the "diff --raw" output format as the input, but with a bit
of twist.
(1) a narrow special case that takes only a single diff_filepair
of <old> and <new> blobs, and immediately run diff_queue() on
that single diff_filepair, which is Justin's use case. For
this mode of operation, "flush after reach record of input"
may be sufficient.
(2) as a general "interchange format" to feed "comparison between
two sets of <object, path>" into our diff machinery, we are
better off if we can treat the input stream as multiple
records that describes comparison between two sets. Imagine
"git log --oneline --first-parent -2 --raw HEAD", where one
set of "diff --raw" records show the changed blobs with their
paths between HEAD~1 and HEAD, and another set does so for
HEAD~2 and HEAD~1. We need to be able to tell where the
first set ends and the second set starts, so that rename
detection and other things, if requested, can be done within
each set.
My recommendation is to use a single blank line as a separator,
e.g.
:100644 100644 ce31f93061 9829984b0a M Documentation/git-refs.txt
:100644 100644 8b3882cff1 4a74f7c7bd M refs.c
:100755 100755 1bfff3a7af f59bc4860f M t/t1460-refs-migrate.sh
:100644 100644 c11213f520 8953d1c6d3 M refs/files-backend.c
:100644 100644 b2e3ba877d bec5962deb M refs/reftable-backend.c
so an application that wants to compare only one diff_filepair
at a time would issue something like
:100644 100644 ce31f93061 9829984b0a M Documentation/git-refs.txt
:100644 100644 8b3882cff1 4a74f7c7bd M refs.c
:100755 100755 1bfff3a7af f59bc4860f M t/t1460-refs-migrate.sh
so the parsing machinery does not have to worry about case (1) above.
* Parse and append the input into diff_queue(), until you see an
blank line.
- If at EOF you are done, but if you have something accumulated
in diff_queue(), show them (like below) first. In any case, at
EOF, you are done.
* Run diffcore_std() followed by diff_flush() to have the contents
of the queue nicely formatted and emptied. Go back to parsing
more input lines.
next prev parent reply other threads:[~2024-12-15 23:24 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13 4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13 4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13 4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13 8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17 ` Junio C Hamano
2024-12-13 10:38 ` Jeff King
2024-12-15 2:07 ` Junio C Hamano
2024-12-15 2:17 ` Junio C Hamano
2024-12-16 11:11 ` Jeff King
2024-12-16 16:29 ` Junio C Hamano
2024-12-18 11:39 ` Jeff King
2024-12-18 14:53 ` Junio C Hamano
2024-12-20 9:09 ` Jeff King
2024-12-20 9:10 ` Jeff King
2024-12-13 16:41 ` Justin Tobler
2024-12-16 11:18 ` Jeff King
2024-12-13 22:34 ` Junio C Hamano
2024-12-15 23:24 ` Junio C Hamano [this message]
2024-12-16 11:30 ` Jeff King
2025-02-12 4:18 ` [PATCH v2 " Justin Tobler
2025-02-12 4:18 ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12 9:06 ` Karthik Nayak
2025-02-12 17:35 ` Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 17:24 ` Justin Tobler
2025-02-13 5:45 ` Patrick Steinhardt
2025-02-12 4:18 ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 9:51 ` Karthik Nayak
2025-02-25 23:38 ` Justin Tobler
2025-02-12 11:40 ` Jean-Noël Avila
2025-02-12 16:50 ` Junio C Hamano
2025-02-19 22:19 ` Justin Tobler
2025-02-19 23:19 ` Junio C Hamano
2025-02-19 23:47 ` Junio C Hamano
2025-02-20 0:32 ` Justin Tobler
2025-02-20 14:56 ` Justin Tobler
2025-02-20 16:14 ` Junio C Hamano
2025-02-17 14:38 ` Phillip Wood
2025-02-19 20:51 ` Justin Tobler
2025-02-19 21:57 ` Junio C Hamano
2025-02-19 22:38 ` Justin Tobler
2025-02-26 14:47 ` Phillip Wood
2025-02-12 4:18 ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-17 14:38 ` Phillip Wood
2025-02-19 23:09 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39 ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04 ` Junio C Hamano
2025-02-25 23:39 ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24 ` Junio C Hamano
2025-02-27 22:15 ` Justin Tobler
2025-02-27 9:35 ` Karthik Nayak
2025-02-27 22:36 ` Justin Tobler
2025-02-27 12:56 ` Patrick Steinhardt
2025-02-27 23:00 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58 ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 0/4] " Justin Tobler
2025-02-28 0:26 ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28 0:26 ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:10 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:26 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33 ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33 ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa5cw8r8k.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).