From: Kirill Smelkov <kirr@navytux.spb.ru>
To: Junio C Hamano <gitster@pobox.com>
Cc: kirr@mns.spb.ru, git@vger.kernel.org
Subject: Re: [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Date: Tue, 8 Apr 2014 00:26:17 +0400 [thread overview]
Message-ID: <20140407202616.GA4140@mini.zxlink> (raw)
In-Reply-To: <xmqqmwfxm2rw.fsf@gitster.dls.corp.google.com> <xmqqr459m4j9.fsf@gitster.dls.corp.google.com>
On Mon, Apr 07, 2014 at 10:29:46AM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
>
> > The following
> > ...
> > maybe looks a bit simpler, but calls tree_entry_pathcmp twice more times.
> >
> > Besides for important nparent=1 case we were not calling
> > tree_entry_pathcmp at all and here we'll call it once, which would slow
> > execution down a bit, as base_name_compare shows measurable enough in profile.
> > To avoid that we'll need to add 'if (i==imin) continue' and this won't
> > be so simple then. And for general nparent case, as I've said, we'll be
> > calling tree_entry_pathcmp twice more times...
> >
> > Because of all that I'd suggest to go with my original version.
>
> OK.
Thanks.
> > ... After some break on the topic,
> > with a fresh eye I see a lot of confusion goes from the notation I've
> > chosen initially (because of how I was reasoning about it on paper, when
> > it was in flux) - i.e. xi for x[imin] and also using i as looping
> > variable. And also because xi was already used for x[imin] I've used
> > another letter 'k' denoting all other x'es, which leads to confusion...
> >
> >
> > I propose we do the following renaming to clarify things:
> >
> > A/a -> T/t (to match resulting tree t name in the code)
> > X/x -> P/p (to match parents trees tp in the code)
> > i -> imin (so that i would be free for other tasks)
> >
> > then the above (with a prologue) would look like
> >
> > ---- 8< ----
> > * T P1 Pn
> > * - - -
> > * |t| |p1| |pn|
> > * |-| |--| ... |--| imin = argmin(p1...pn)
> > * | | | | | |
> > * |-| |--| |--|
> > * |.| |. | |. |
> > * . . .
> > * . . .
> > *
> > * at any time there could be 3 cases:
> > *
> > * 1) t < p[imin];
> > * 2) t > p[imin];
> > * 3) t = p[imin].
> > *
> > * Schematic deduction of what every case means, and what to do, follows:
> > *
> > * 1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
> > *
> > * 2) t > p[imin]
> > *
> > * 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
> > * 2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
> > *
> > * 3) t = p[imin]
> > *
> > * 3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
> > * 3.2) pi = p[imin] -> investigate δ(t,pi)
> > * |
> > * |
> > * v
> > *
> > * 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
> > *
> > * ⎧δ(t,pi) - if pi=p[imin]
> > * -> D += ⎨
> > * ⎩"+t" - if pi>p[imin]
> > *
> > *
> > * in any case t↓ ∀ pi=p[imin] pi↓
> > ...
> > now xk is gone and i matches p[i] (= pi) etc so variable names correlate
> > to algorithm description better.
> >
> > Does that maybe clarify things?
>
> That sounds more consistent (modulo perhaps s/argmin/min/ at the
> beginning?).
Thanks. argmin is there on purpose - min(p1...pn) is the minimal p, and
argmin(p1...pn) is imin such that p[imin] is minimal. As we are finding
the index of the minimal element we should use argmin.
> > P.S. Sorry for maybe some crept-in mistakes - I've tried to verify it
> > thoroughly, but am too sleepy to be completely sure. On the other hand I
> > think and hope the patch should be ok.
>
> Thanks and do not be sorry for "mistakes"---we have the review
> process exactly for catching them.
Thanks, I appreciate that.
On Mon, Apr 07, 2014 at 11:07:47AM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
>
> >> > + if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER)) {
> >> > + for (i = 0; i < nparent; ++i)
> >> > + if (tp[i].entry.mode & S_IFXMIN_NEQ)
> >> > + goto skip_emit_tp;
> >> > + }
> >> > +
> >> > + p = emit_path(p, base, opt, nparent,
> >> > + /*t=*/NULL, tp, imin);
> >> > +
> >> > + skip_emit_tp:
> >> > + /* ∀ xk=ximin xk↓ */
> >> > + update_tp_entries(tp, nparent);
> >>
> >> There are parents whose path sort earlier than what is in 't'
> >> (i.e. they were lost in the result---we would want to show
> >> removal). What makes us jump to the skip label?
> >>
> >> We are looking at path in 't', and some parents have paths that
> >> sort earlier than that path. We will not go to skip label if
> >> any one of the parent's entry sorts after some other parent (or
> >> the parent in question has ran out its entries), which means we
> >> show the entry from the parents only when all the parents have
> >> that same path, which is missing from 't'.
> >>
> >> I am not sure if I am reading this correctly, though.
> >>
> >> For the two-way diff, the above degenerates to "show all parent
> >> entries that come before the first entry in 't'", which is correct.
> >> For the combined diff, the current intersect_paths() makes sure that
> >> each path appears in all the pair-wise diff between t and tp[],
> >> which again means that the above logic match the current behaviour.
> >
> > Yes, correct (modulo we *will* go to skip label if any one of the
> > parent's entry sorts after some other parent). By definition of combined
> > diff we show a path only if it shows in every diff D(T,Pi), and if
> >
> > 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
> >
> > some pj sorts after p[imin] that would mean that Pj does not have
> > p[imin] and since t > p[imin] (which means T does not have p[imin]
> > either) diff D(T,Pj) does not have p[imin]. And because of that we know
> > the whole combined-diff will not have p[imin] as, by definition,
> > combined diff is sets intersection and one of the sets does not have
> > that path.
> >
> > ( In usual words p[imin] is not changed between Pj..T - it was
> > e.g. removed in Pj~, so merging parents to T does not bring any new
> > information wrt path p[imin] and that is why we do not want to show
> > p[imin] in combined-diff output - no new change about that path )
> >
> > So nothing to append to the output, and update minimum tree entries,
> > preparing for the next step.
>
> That's all in line with the current and traditional definition of
> combined diff.
Ok.
> This is a tangent that is outside the scope of this current topic,
> but I wonder if you found it disturbing that we treat the result 't'
> that has a path and the result 't' that does not have a path with
> respect to a parent that does not have the path in a somewhat
> assymmetric way.
>
> With a merge M between commits A and B, where they all have the same
> path with different contents, we obviously show that path in the
> combined diff format. A merge N that records exactly the same tree
> as M that merges the same commits A and B plus another commit C that
> does not have that path still shows the combined diff, with one
> extra column to express "everything in the result N has been added
> with respect to C which did not have the path at all".
>
> However, a merge O between the same commits A and B, where A and B
> have a path and O loses it, shows the path in the combined format.
> A merge P among the same A, B and an extra parent C that does not
> have that path ceases to show it (this is the assymmetry).
Symmetry properties are very important and if something does not fit into
symmetry - then something somewhere is really wrong for sure, but I
think there is no asymmetry here. In your example
M∆∙ N∆∙
/ \ . '''
/ .'\ ' '
A∆ B∙ Cø
\ ''/. ' '
\ / '''
Oø Pø
let's say a path can be:
ø - empty
∆ - triangle
∙ - bullet
∆∙ - triangle+bullet
then some symmetry operation is
∆∙ <-> ø
∙ <-> ∙
∆ <-> ∆
so you "mirror" ∆∙ to ø (M,N->O,P), and so ø is mirrored back to ∆∙
(O,P->M,N). But then, when we mirror the whole graph, Cø should be
mirrored to C'∆∙ and then that would be correct:
A∆ B∙ C'∆∙
\ ''/. ' '
\ / '''
Oø Pø
P to A,B,C' would show the path as N to A,B,C(without')
In other words a merge P among the same A, B and extra parent C' with
"contains everything" content is symmetrical to merge N to A,B and C
with empty content.
> It is a natural extension of "Do not show the path when the result
> matches one of the parent" rule, and in this case the result P takes
> contents, "the path does not exist", from one parent "C", so it is
> internally consistent, and I originally designed it that way on
> purpose, but somehow it feels a bit strange.
I hope it does not fill strange once the true symmetry is discovered,
and also P does not add a change compared to C, so the combined diff
should be empty as no new information is present in a merge itself.
A bit unusual on the first glance, but not strange, once you are used to
it and consistent.
Thanks,
Kirill
next prev parent reply other threads:[~2014-04-07 20:23 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-24 16:21 [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Kirill Smelkov
2014-02-24 16:21 ` [PATCH 01/19] combine-diff: move show_log_first logic/action out of paths scanning Kirill Smelkov
2014-02-24 16:21 ` [PATCH 02/19] combine-diff: move changed-paths scanning logic into its own function Kirill Smelkov
2014-02-24 16:21 ` [PATCH 03/19] tree-diff: no need to manually verify that there is no mode change for a path Kirill Smelkov
2014-02-24 16:21 ` [PATCH 04/19] tree-diff: no need to pass match to skip_uninteresting() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 05/19] tree-diff: show_tree() is not needed Kirill Smelkov
2014-02-24 16:21 ` [PATCH 06/19] tree-diff: consolidate code for emitting diffs and recursion in one place Kirill Smelkov
2014-02-24 16:21 ` [PATCH 07/19] tree-diff: don't assume compare_tree_entry() returns -1,0,1 Kirill Smelkov
2014-02-24 16:21 ` [PATCH 08/19] tree-diff: move all action-taking code out of compare_tree_entry() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 09/19] tree-diff: rename compare_tree_entry -> tree_entry_pathcmp Kirill Smelkov
2014-02-24 16:21 ` [PATCH 10/19] tree-diff: show_path prototype is not needed anymore Kirill Smelkov
2014-02-24 16:21 ` [PATCH 11/19] tree-diff: simplify tree_entry_pathcmp Kirill Smelkov
2014-03-24 21:25 ` Junio C Hamano
2014-03-25 9:23 ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 12/19] tree-diff: remove special-case diff-emitting code for empty-tree cases Kirill Smelkov
2014-03-24 21:18 ` Junio C Hamano
2014-03-25 9:20 ` Kirill Smelkov
2014-03-25 17:45 ` Junio C Hamano
2014-03-26 18:32 ` Kirill Smelkov
2014-03-25 22:07 ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 13/19] tree-diff: diff_tree() should now be static Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 14/19] tree-diff: rework diff_tree interface to be sha1 based Kirill Smelkov
2014-03-24 21:36 ` Junio C Hamano
2014-03-25 9:22 ` Kirill Smelkov
2014-03-25 17:46 ` Junio C Hamano
2014-03-26 19:52 ` Kirill Smelkov
2014-03-26 21:34 ` Junio C Hamano
2014-03-27 14:24 ` Kirill Smelkov
2014-03-27 18:48 ` Junio C Hamano
2014-03-27 19:43 ` Kirill Smelkov
2014-03-28 6:52 ` Johannes Sixt
2014-03-28 17:06 ` Junio C Hamano
2014-03-28 17:46 ` Johannes Sixt
2014-03-28 18:36 ` Junio C Hamano
2014-03-28 19:08 ` Johannes Sixt
2014-03-28 19:27 ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 15/19] tree-diff: no need to call "full" diff_tree_sha1 from show_path() Kirill Smelkov
2014-03-27 14:21 ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 16/19] tree-diff: reuse base str(buf) memory on sub-tree recursion Kirill Smelkov
2014-03-24 21:43 ` Junio C Hamano
2014-03-25 9:23 ` Kirill Smelkov
2014-03-27 14:22 ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 17/19] Portable alloca for Git Kirill Smelkov
2014-02-28 10:58 ` Thomas Schwinge
2014-02-28 13:44 ` Erik Faye-Lund
2014-02-28 13:50 ` Erik Faye-Lund
2014-02-28 17:00 ` Kirill Smelkov
2014-02-28 17:19 ` Erik Faye-Lund
2014-03-05 9:31 ` Kirill Smelkov
2014-03-24 21:47 ` Junio C Hamano
2014-03-27 14:22 ` Kirill Smelkov
2014-04-09 12:48 ` Kirill Smelkov
2014-04-09 13:01 ` Erik Faye-Lund
2014-04-10 17:30 ` Junio C Hamano
2014-02-24 16:21 ` [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Kirill Smelkov
2014-03-27 14:23 ` Kirill Smelkov
2014-04-04 18:42 ` Junio C Hamano
2014-04-06 21:46 ` Kirill Smelkov
2014-04-07 17:29 ` Junio C Hamano
2014-04-07 20:26 ` Kirill Smelkov [this message]
2014-04-07 18:07 ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 19/19] combine-diff: speed it up, by using multiparent diff tree-walker directly Kirill Smelkov
2014-02-24 23:43 ` [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Duy Nguyen
2014-02-25 10:38 ` Kirill Smelkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140407202616.GA4140@mini.zxlink \
--to=kirr@navytux.spb.ru \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kirr@mns.spb.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).