From: Johannes Altmanninger <aclopte@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>,
"Git Mailing List" <git@vger.kernel.org>,
"Jeff King" <peff@peff.net>,
"Jonathan Nieder" <jrnieder@gmail.com>,
"Sergey Organov" <sorganov@gmail.com>,
"Bagas Sanjaya" <bagasdotme@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Neeraj Singh" <nksingh85@gmail.com>
Subject: Re: [PATCH v2 7/8] diff: add ability to insert additional headers for paths
Date: Wed, 29 Dec 2021 01:16:47 +0100 [thread overview]
Message-ID: <20211229001647.6pv5damtyt3dsiyr@gmail.com> (raw)
In-Reply-To: <CABPp-BH5XUsmTo=BD7osUgi4o=eFWgaQkN1qYDky6uqb9SykHA@mail.gmail.com>
On Tue, Dec 28, 2021 at 01:09:57PM -0800, Elijah Newren wrote:
> On Tue, Dec 28, 2021 at 2:57 AM Johannes Altmanninger <aclopte@gmail.com> wrote:
> >
> > On Sat, Dec 25, 2021 at 07:59:18AM +0000, Elijah Newren via GitGitGadget wrote:
> > > + for (i = 0; i < q->nr; i++) {
> > > + struct diff_filepair *p = q->queue[i];
> > > + char *path = p->one->path ? p->one->path : p->two->path;
> > > +
> > > + if (strmap_contains(o->additional_path_headers, path))
> > > + strset_add(&present, path);
> > > + }
> > > +
> > > + /*
> > > + * Loop over paths in additional_path_headers; for each NOT already
> > > + * in diff_queued_diff, create a synthetic filepair and insert that
> > > + * into diff_queued_diff.
> > > + */
> > > + strmap_for_each_entry(o->additional_path_headers, &iter, e) {
> > > + if (!strset_contains(&present, e->key)) {
> > > + struct diff_filespec *one, *two;
> > > + struct diff_filepair *p;
> > > +
> > > + one = alloc_filespec(e->key);
> > > + two = alloc_filespec(e->key);
> > > + fill_filespec(one, null_oid(), 0, 0);
> > > + fill_filespec(two, null_oid(), 0, 0);
> > > + p = diff_queue(q, one, two);
> > > + p->status = DIFF_STATUS_MODIFIED;
> > > + }
> > > + }
> >
> > All these string hash-maps are not really typical for a C program. I'm sure
> > they are the best choice for an advanced merge algorithm
>
> Agreed up to here.
>
> > but they are not
> > really necessary for computing/printing a diff.
>
> Technically agree that it _could_ be solved a different way, but the
> strmaps are a much more natural solution to this problem in this
> particular case; more on this below.
Oh yeah, I agree that strmaps are the more intuitive solution.
>
> > It feels like this is an
> > implementation detail from merge-ort that's leaking into other components.
>
> And I disagree here, on _both_ the explicit point and the underlying
> suggestion that you seem to be making that strmap should be avoided
> outside of merging. The strmap.[ch] type was originally a suggestion
> from Peff for areas of git completely unrelated to merging (see the
> beginning of https://lore.kernel.org/git/20200821194857.GD1165@coredump.intra.peff.net/,
> and the first link in that email). It's a new datatype for git, much
> like strbuf or string_list or whatever before it, that is there to be
> used when it's a natural fit for the problem at hand. The lack of
> strmap previously led folks to abuse other existing data structures
> (and in a way that often led to poor performance to boot).
Right, all those rename-detection performance fixes were pretty dazzling
>
> > What we want to do is
> >
> > for file_pair in additional_headers:
> > if not already_queued(file_pair):
> > queue(file_pair)
>
> Yes, precisely.
>
> > to do that, you use a temporary has-set ("present") that records everything
> > that's already queued (already_queued() is a lookup in that set).
> >
> > Let's assume both the queue and additional_headers are sorted arrays.
>
> That's a bad assumption; we can't rely on *either* being sorted. I
OK, I hadn't checked if the queue is sorted
> actually started my implementation by trying exactly what you mention
> first; I too thought it'd be more natural and clearer to do this. Of
> course, before implementing it, I had to verify whether
> diff_queued_diff was sorted. So, I added some code that would check
> the order and fail if the queue wasn't sorted. 7 of the test files in
> the regression testsuite had one or more failing tests.
>
> I think the queue was intended to be sorted (see
> diffcore_fix_diff_index()), but in practice it's not. And I'm worried
> that if I find the current cases where it fails to be sorted and "fix"
> them (though I don't actually know if this was intentional or not so I
> don't know if that's really a fix or a break), that I'd end up with
> additional cases in the future where they fail to be sorted anyway.
> So, no matter what, relying on diff_queued_diff being sorted seems
> ill-advised.
>
> Also...
>
> > Then we could efficiently merge them (like a merge-sort algorithm)
> > without ever allocating a temporary hash map.
> >
> > I haven't checked if this is practical (better wait for feedback).
> > We'd probably need to convert the strmap additional_path_headers into an
> > array and sort it (I guess our hash map does not guarantee any ordering?)
>
> Right, strmap has no ordering either. I was willing to stick those
> into a string_list and sort them, but making temporary copies of both
> the strmap and the diff_queued_diff just to sort them so that I can
But you already sort diff_queued_diff at the end of
create_filepairs_for_header_only_notifications(), so sorting a bit earlier
in that function, before enqueueing the new entries won't change the final
result, and allows us to work with a sorted queue; no need for a temporary
copy (we'd only need to copy the strmap).
> reasonably cheaply ask "are items from this thing present in this
> other thing?" seems to be stretching things a bit too far.
> maps/hashes provide a very nice "is this item present" lookup and are
> a natural way to ask that. Since that is exactly the question I am
> asking, I think they are the better data structure here.
Yeah that makes sense. In theory if we ask
"What is the union of the queued pairs and the extra pairs induced by
conflict messages?" we could abstract away the "is this item present"
lookup but in practice that's hard.
> So, this was not at all a leak of merge-ort datastructures, but rather a
> picking of the appropriate data structures for the problem at hand.
I think we have two viable solutions to this problem
1. use a temporary strset to figure out which pairs to add
2. use a temporary array, sort it, and "merge" the two arrays
I agree that 1 is more intuitive and natural for humans, and it's probably the
way to go. But it is a bit less elegant because it adds a strmap entry for
each pair in the queue, whereas 2 only needs to add an array element for
each pair with non-content conflicts, which are much fewer. (Okay that's a
minor detail.) With the right abstractions 2 is pretty simple as well:
j = 0
extra_headers = sorted((key, val) for key, val in additional_headers)
for i in 0..len(queue):
while j < len(extra_headers) && compare(extra_headers[j].key, queue[i]) <= 0:
if compare(extra_headers[j].key, queue[i]) < 0:
enqueue(file_pair_for(extra_headers[j]))
j++
where
def compare(key: str, pair: diff_filepair) -> int:
other = pair.one ? pair.one.path : pair.two.path # Mimic diffnamecmp
return strcmp(key, other)
next prev parent reply other threads:[~2021-12-29 0:16 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-21 18:05 [PATCH 0/9] Add a new --remerge-diff capability to show & log Elijah Newren via GitGitGadget
2021-12-21 18:05 ` [PATCH 1/9] tmp_objdir: add a helper function for discarding all contained objects Elijah Newren via GitGitGadget
2021-12-21 23:26 ` Junio C Hamano
2021-12-21 23:51 ` Elijah Newren
2021-12-22 6:23 ` Junio C Hamano
2021-12-25 2:29 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 2/9] ll-merge: make callers responsible for showing warnings Elijah Newren via GitGitGadget
2021-12-21 21:19 ` Ævar Arnfjörð Bjarmason
2021-12-21 21:57 ` Elijah Newren
2021-12-21 23:02 ` Ævar Arnfjörð Bjarmason
2021-12-21 23:15 ` Elijah Newren
2021-12-21 23:44 ` Junio C Hamano
2021-12-23 18:26 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 3/9] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2021-12-22 0:00 ` Junio C Hamano
2021-12-23 18:36 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 4/9] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2021-12-22 0:06 ` Junio C Hamano
2021-12-23 18:38 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 5/9] merge-ort: make path_messages available to external callers Elijah Newren via GitGitGadget
2021-12-21 18:05 ` [PATCH 6/9] diff: add ability to insert additional headers for paths Elijah Newren via GitGitGadget
2021-12-22 0:24 ` Junio C Hamano
2021-12-25 2:35 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 7/9] merge-ort: format messages slightly different for use in headers Elijah Newren via GitGitGadget
2021-12-21 18:05 ` [PATCH 8/9] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2021-12-21 21:23 ` Ævar Arnfjörð Bjarmason
2021-12-21 22:18 ` Elijah Newren
2021-12-21 18:05 ` [PATCH 9/9] doc/diff-options: explain the new --remerge-diff option Elijah Newren via GitGitGadget
2021-12-21 21:28 ` Ævar Arnfjörð Bjarmason
2021-12-21 22:24 ` Elijah Newren
2021-12-21 23:47 ` Ævar Arnfjörð Bjarmason
2021-12-22 19:05 ` Elijah Newren
2021-12-21 23:20 ` [PATCH 0/9] Add a new --remerge-diff capability to show & log Junio C Hamano
2021-12-21 23:43 ` Elijah Newren
2021-12-22 0:33 ` Junio C Hamano
2021-12-25 7:59 ` [PATCH v2 0/8] " Elijah Newren via GitGitGadget
2021-12-25 7:59 ` [PATCH v2 1/8] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2021-12-28 10:56 ` Johannes Altmanninger
2021-12-28 22:34 ` Elijah Newren
2021-12-28 23:01 ` brian m. carlson
2021-12-28 23:45 ` Elijah Newren
2021-12-25 7:59 ` [PATCH v2 2/8] log: clean unneeded objects during `log --remerge-diff` Elijah Newren via GitGitGadget
2021-12-25 7:59 ` [PATCH v2 3/8] ll-merge: make callers responsible for showing warnings Elijah Newren via GitGitGadget
2021-12-28 10:56 ` Johannes Altmanninger
2021-12-28 19:37 ` Elijah Newren
2021-12-28 22:05 ` Johannes Altmanninger
2021-12-25 7:59 ` [PATCH v2 4/8] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2021-12-25 7:59 ` [PATCH v2 5/8] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2021-12-25 7:59 ` [PATCH v2 6/8] merge-ort: format messages slightly different for use in headers Elijah Newren via GitGitGadget
2021-12-26 18:30 ` In-tree strbuf "in-place" search/replace (was: [PATCH v2 6/8] merge-ort: format messages slightly different for use in headers) Ævar Arnfjörð Bjarmason
2021-12-28 10:56 ` [PATCH v2 6/8] merge-ort: format messages slightly different for use in headers Johannes Altmanninger
2021-12-28 21:48 ` Elijah Newren
2021-12-25 7:59 ` [PATCH v2 7/8] diff: add ability to insert additional headers for paths Elijah Newren via GitGitGadget
2021-12-28 10:57 ` Johannes Altmanninger
2021-12-28 21:09 ` Elijah Newren
2021-12-29 0:16 ` Johannes Altmanninger [this message]
2021-12-30 22:04 ` Elijah Newren
2021-12-31 3:07 ` Johannes Altmanninger
2021-12-25 7:59 ` [PATCH v2 8/8] show, log: include conflict/warning messages in --remerge-diff headers Elijah Newren via GitGitGadget
2021-12-28 10:57 ` Johannes Altmanninger
2021-12-28 23:42 ` Elijah Newren
2021-12-26 21:52 ` [PATCH v2 0/8] Add a new --remerge-diff capability to show & log Ævar Arnfjörð Bjarmason
2021-12-27 21:11 ` Elijah Newren
2022-01-10 15:48 ` Ævar Arnfjörð Bjarmason
2021-12-28 10:55 ` Johannes Altmanninger
2021-12-30 23:36 ` [PATCH v3 0/9] " Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 1/9] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2022-01-19 15:49 ` Ævar Arnfjörð Bjarmason
2022-01-20 2:31 ` Elijah Newren
2022-01-20 7:53 ` Elijah Newren
2022-01-19 16:01 ` Ævar Arnfjörð Bjarmason
2022-01-20 2:33 ` Elijah Newren
2021-12-30 23:36 ` [PATCH v3 2/9] log: clean unneeded objects during `log --remerge-diff` Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 3/9] ll-merge: make callers responsible for showing warnings Elijah Newren via GitGitGadget
2022-01-19 16:41 ` Ævar Arnfjörð Bjarmason
2022-01-20 3:29 ` Elijah Newren
2021-12-30 23:36 ` [PATCH v3 4/9] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 5/9] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 6/9] merge-ort: format messages slightly different for use in headers Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 7/9] diff: add ability to insert additional headers for paths Elijah Newren via GitGitGadget
2021-12-30 23:36 ` [PATCH v3 8/9] show, log: include conflict/warning messages in --remerge-diff headers Elijah Newren via GitGitGadget
2022-01-19 16:19 ` Ævar Arnfjörð Bjarmason
2022-01-21 2:16 ` Elijah Newren
2022-01-21 16:55 ` Elijah Newren
2021-12-30 23:36 ` [PATCH v3 9/9] merge-ort: mark conflict/warning messages from inner merges as omittable Elijah Newren via GitGitGadget
2021-12-31 8:46 ` [PATCH v3 0/9] Add a new --remerge-diff capability to show & log Junio C Hamano
2022-01-21 19:12 ` [PATCH v4 00/10] " Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 01/10] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2022-02-01 9:09 ` Ævar Arnfjörð Bjarmason
2022-02-01 16:40 ` Elijah Newren
2022-01-21 19:12 ` [PATCH v4 02/10] log: clean unneeded objects during `log --remerge-diff` Elijah Newren via GitGitGadget
2022-02-01 9:35 ` Ævar Arnfjörð Bjarmason
2022-02-01 16:54 ` Elijah Newren
2022-02-02 11:17 ` Ævar Arnfjörð Bjarmason
2022-01-21 19:12 ` [PATCH v4 03/10] ll-merge: make callers responsible for showing warnings Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 04/10] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 05/10] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 06/10] merge-ort: format messages slightly different for use in headers Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 07/10] diff: add ability to insert additional headers for paths Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 08/10] show, log: include conflict/warning messages in --remerge-diff headers Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 09/10] merge-ort: mark conflict/warning messages from inner merges as omittable Elijah Newren via GitGitGadget
2022-01-21 19:12 ` [PATCH v4 10/10] diff-merges: avoid history simplifications when diffing merges Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 00/10] Add a new --remerge-diff capability to show & log Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 01/10] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 02/10] log: clean unneeded objects during `log --remerge-diff` Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 03/10] ll-merge: make callers responsible for showing warnings Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 04/10] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 05/10] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 06/10] merge-ort: format messages slightly different for use in headers Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 07/10] diff: add ability to insert additional headers for paths Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 08/10] show, log: include conflict/warning messages in --remerge-diff headers Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 09/10] merge-ort: mark conflict/warning messages from inner merges as omittable Elijah Newren via GitGitGadget
2022-02-02 2:37 ` [PATCH v5 10/10] diff-merges: avoid history simplifications when diffing merges Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211229001647.6pv5damtyt3dsiyr@gmail.com \
--to=aclopte@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jrnieder@gmail.com \
--cc=newren@gmail.com \
--cc=nksingh85@gmail.com \
--cc=peff@peff.net \
--cc=sorganov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).