All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill Smelkov <kirr@navytux.spb.ru>
To: Junio C Hamano <gitster@pobox.com>
Cc: kirr@mns.spb.ru, git@vger.kernel.org
Subject: Re: [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Date: Tue, 8 Apr 2014 00:26:17 +0400	[thread overview]
Message-ID: <20140407202616.GA4140@mini.zxlink> (raw)
In-Reply-To: <xmqqmwfxm2rw.fsf@gitster.dls.corp.google.com> <xmqqr459m4j9.fsf@gitster.dls.corp.google.com>

On Mon, Apr 07, 2014 at 10:29:46AM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
> 
> > The following
> > ...
> > maybe looks a bit simpler, but calls tree_entry_pathcmp twice more times.
> >
> > Besides for important nparent=1 case we were not calling
> > tree_entry_pathcmp at all and here we'll call it once, which would slow
> > execution down a bit, as base_name_compare shows measurable enough in profile.
> > To avoid that we'll need to add 'if (i==imin) continue' and this won't
> > be so simple then. And for general nparent case, as I've said, we'll be
> > calling tree_entry_pathcmp twice more times...
> >
> > Because of all that I'd suggest to go with my original version.
> 
> OK.

Thanks.

> > ... After some break on the topic,
> > with a fresh eye I see a lot of confusion goes from the notation I've
> > chosen initially (because of how I was reasoning about it on paper, when
> > it was in flux) - i.e. xi for x[imin] and also using i as looping
> > variable. And also because xi was already used for x[imin] I've used
> > another letter 'k' denoting all other x'es, which leads to confusion...
> >
> >
> > I propose we do the following renaming to clarify things:
> >
> >     A/a     ->      T/t     (to match resulting tree t name in the code)
> >     X/x     ->      P/p     (to match parents trees tp in the code)
> >     i       ->      imin    (so that i would be free for other tasks)
> >
> > then the above (with a prologue) would look like
> >
> > ---- 8< ----
> >  *       T     P1       Pn
> >  *       -     -        -
> >  *      |t|   |p1|     |pn|
> >  *      |-|   |--| ... |--|      imin = argmin(p1...pn)
> >  *      | |   |  |     |  |
> >  *      |-|   |--|     |--|
> >  *      |.|   |. |     |. |
> >  *       .     .        .
> >  *       .     .        .
> >  *
> >  * at any time there could be 3 cases:
> >  *
> >  *      1)  t < p[imin];
> >  *      2)  t > p[imin];
> >  *      3)  t = p[imin].
> >  *
> >  * Schematic deduction of what every case means, and what to do, follows:
> >  *
> >  * 1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓
> >  *
> >  * 2)  t > p[imin]
> >  *
> >  *     2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
> >  *     2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓
> >  *
> >  * 3)  t = p[imin]
> >  *
> >  *     3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
> >  *     3.2) pi = p[imin]  ->  investigate δ(t,pi)
> >  *      |
> >  *      |
> >  *      v
> >  *
> >  *     3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->
> >  *
> >  *                       ⎧δ(t,pi)  - if pi=p[imin]
> >  *              ->  D += ⎨
> >  *                       ⎩"+t"     - if pi>p[imin]
> >  *
> >  *
> >  *     in any case t↓  ∀ pi=p[imin]  pi↓
> > ...
> > now xk is gone and i matches p[i] (= pi) etc so variable names correlate
> > to algorithm description better.
> >
> > Does that maybe clarify things?
> 
> That sounds more consistent (modulo perhaps s/argmin/min/ at the
> beginning?).

Thanks. argmin is there on purpose - min(p1...pn) is the minimal p, and
argmin(p1...pn) is imin such that p[imin] is minimal. As we are finding
the index of the minimal element we should use argmin.


> > P.S. Sorry for maybe some crept-in mistakes - I've tried to verify it
> > thoroughly, but am too sleepy to be completely sure. On the other hand I
> > think and hope the patch should be ok.
> 
> Thanks and do not be sorry for "mistakes"---we have the review
> process exactly for catching them.

Thanks, I appreciate that.


On Mon, Apr 07, 2014 at 11:07:47AM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
> 
> >> > +			if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER)) {
> >> > +				for (i = 0; i < nparent; ++i)
> >> > +					if (tp[i].entry.mode & S_IFXMIN_NEQ)
> >> > +						goto skip_emit_tp;
> >> > +			}
> >> > +
> >> > +			p = emit_path(p, base, opt, nparent,
> >> > +					/*t=*/NULL, tp, imin);
> >> > +
> >> > +		skip_emit_tp:
> >> > +			/* ∀ xk=ximin  xk↓ */
> >> > +			update_tp_entries(tp, nparent);
> >> 
> >> There are parents whose path sort earlier than what is in 't'
> >> (i.e. they were lost in the result---we would want to show
> >> removal).  What makes us jump to the skip label?
> >> 
> >>     We are looking at path in 't', and some parents have paths that
> >>     sort earlier than that path.  We will not go to skip label if
> >>     any one of the parent's entry sorts after some other parent (or
> >>     the parent in question has ran out its entries), which means we
> >>     show the entry from the parents only when all the parents have
> >>     that same path, which is missing from 't'.
> >> 
> >> I am not sure if I am reading this correctly, though.
> >> 
> >> For the two-way diff, the above degenerates to "show all parent
> >> entries that come before the first entry in 't'", which is correct.
> >> For the combined diff, the current intersect_paths() makes sure that
> >> each path appears in all the pair-wise diff between t and tp[],
> >> which again means that the above logic match the current behaviour.
> >
> > Yes, correct (modulo we *will* go to skip label if any one of the
> > parent's entry sorts after some other parent). By definition of combined
> > diff we show a path only if it shows in every diff D(T,Pi), and if 
> >
> >     2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
> >
> > some pj sorts after p[imin] that would mean that Pj does not have
> > p[imin] and since t > p[imin] (which means T does not have p[imin]
> > either) diff D(T,Pj) does not have p[imin]. And because of that we know
> > the whole combined-diff will not have p[imin] as, by definition,
> > combined diff is sets intersection and one of the sets does not have
> > that path.
> >
> >   ( In usual words p[imin] is not changed between Pj..T - it was
> >     e.g. removed in Pj~, so merging parents to T does not bring any new
> >     information wrt path p[imin] and that is why we do not want to show
> >     p[imin] in combined-diff output - no new change about that path )
> >
> > So nothing to append to the output, and update minimum tree entries,
> > preparing for the next step.
> 
> That's all in line with the current and traditional definition of
> combined diff.

Ok.


> This is a tangent that is outside the scope of this current topic,
> but I wonder if you found it disturbing that we treat the result 't'
> that has a path and the result 't' that does not have a path with
> respect to a parent that does not have the path in a somewhat
> assymmetric way.
> 
> With a merge M between commits A and B, where they all have the same
> path with different contents, we obviously show that path in the
> combined diff format.  A merge N that records exactly the same tree
> as M that merges the same commits A and B plus another commit C that
> does not have that path still shows the combined diff, with one
> extra column to express "everything in the result N has been added
> with respect to C which did not have the path at all".
> 
> However, a merge O between the same commits A and B, where A and B
> have a path and O loses it, shows the path in the combined format.
> A merge P among the same A, B and an extra parent C that does not
> have that path ceases to show it (this is the assymmetry).

Symmetry properties are very important and if something does not fit into
symmetry - then something somewhere is really wrong for sure, but I
think there is no asymmetry here. In your example

      M∆∙    N∆∙
     / \ . '''
    / .'\  '  '
    A∆   B∙    Cø
    \ ''/. '  '
     \ /   '''
      Oø     Pø

let's say a path can be:

    ø   - empty
    ∆   - triangle
    ∙   - bullet
    ∆∙  - triangle+bullet

then some symmetry operation is

    ∆∙  <-> ø
    ∙   <-> ∙
    ∆   <-> ∆

so you "mirror" ∆∙ to ø (M,N->O,P), and so ø is mirrored back to ∆∙
(O,P->M,N). But then, when we mirror the whole graph, Cø should be
mirrored to C'∆∙ and then that would be correct:

    A∆   B∙    C'∆∙
    \ ''/. '  '
     \ /   '''
      Oø     Pø

P to A,B,C' would show the path as N to A,B,C(without')


In other words a merge P among the same A, B and extra parent C' with
"contains everything" content is symmetrical to merge N to A,B and C
with empty content.


> It is a natural extension of "Do not show the path when the result
> matches one of the parent" rule, and in this case the result P takes
> contents, "the path does not exist", from one parent "C", so it is
> internally consistent, and I originally designed it that way on
> purpose, but somehow it feels a bit strange.

I hope it does not fill strange once the true symmetry is discovered,
and also P does not add a change compared to C, so the combined diff
should be empty as no new information is present in a merge itself.

A bit unusual on the first glance, but not strange, once you are used to
it and consistent.

Thanks,
Kirill

  reply	other threads:[~2014-04-07 20:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-24 16:21 [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Kirill Smelkov
2014-02-24 16:21 ` [PATCH 01/19] combine-diff: move show_log_first logic/action out of paths scanning Kirill Smelkov
2014-02-24 16:21 ` [PATCH 02/19] combine-diff: move changed-paths scanning logic into its own function Kirill Smelkov
2014-02-24 16:21 ` [PATCH 03/19] tree-diff: no need to manually verify that there is no mode change for a path Kirill Smelkov
2014-02-24 16:21 ` [PATCH 04/19] tree-diff: no need to pass match to skip_uninteresting() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 05/19] tree-diff: show_tree() is not needed Kirill Smelkov
2014-02-24 16:21 ` [PATCH 06/19] tree-diff: consolidate code for emitting diffs and recursion in one place Kirill Smelkov
2014-02-24 16:21 ` [PATCH 07/19] tree-diff: don't assume compare_tree_entry() returns -1,0,1 Kirill Smelkov
2014-02-24 16:21 ` [PATCH 08/19] tree-diff: move all action-taking code out of compare_tree_entry() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 09/19] tree-diff: rename compare_tree_entry -> tree_entry_pathcmp Kirill Smelkov
2014-02-24 16:21 ` [PATCH 10/19] tree-diff: show_path prototype is not needed anymore Kirill Smelkov
2014-02-24 16:21 ` [PATCH 11/19] tree-diff: simplify tree_entry_pathcmp Kirill Smelkov
2014-03-24 21:25   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 12/19] tree-diff: remove special-case diff-emitting code for empty-tree cases Kirill Smelkov
2014-03-24 21:18   ` Junio C Hamano
2014-03-25  9:20     ` Kirill Smelkov
2014-03-25 17:45       ` Junio C Hamano
2014-03-26 18:32         ` Kirill Smelkov
2014-03-25 22:07       ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 13/19] tree-diff: diff_tree() should now be static Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 14/19] tree-diff: rework diff_tree interface to be sha1 based Kirill Smelkov
2014-03-24 21:36   ` Junio C Hamano
2014-03-25  9:22     ` Kirill Smelkov
2014-03-25 17:46       ` Junio C Hamano
2014-03-26 19:52         ` Kirill Smelkov
2014-03-26 21:34           ` Junio C Hamano
2014-03-27 14:24             ` Kirill Smelkov
2014-03-27 18:48               ` Junio C Hamano
2014-03-27 19:43                 ` Kirill Smelkov
2014-03-28  6:52                 ` Johannes Sixt
2014-03-28 17:06                   ` Junio C Hamano
2014-03-28 17:46                     ` Johannes Sixt
2014-03-28 18:36                       ` Junio C Hamano
2014-03-28 19:08                         ` Johannes Sixt
2014-03-28 19:27                           ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 15/19] tree-diff: no need to call "full" diff_tree_sha1 from show_path() Kirill Smelkov
2014-03-27 14:21   ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 16/19] tree-diff: reuse base str(buf) memory on sub-tree recursion Kirill Smelkov
2014-03-24 21:43   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-03-27 14:22       ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 17/19] Portable alloca for Git Kirill Smelkov
2014-02-28 10:58   ` Thomas Schwinge
2014-02-28 13:44   ` Erik Faye-Lund
2014-02-28 13:50     ` Erik Faye-Lund
2014-02-28 17:00       ` Kirill Smelkov
2014-02-28 17:19         ` Erik Faye-Lund
2014-03-05  9:31           ` Kirill Smelkov
2014-03-24 21:47             ` Junio C Hamano
2014-03-27 14:22               ` Kirill Smelkov
2014-04-09 12:48                 ` Kirill Smelkov
2014-04-09 13:01                   ` Erik Faye-Lund
2014-04-10 17:30                     ` Junio C Hamano
2014-02-24 16:21 ` [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Kirill Smelkov
2014-03-27 14:23   ` Kirill Smelkov
2014-04-04 18:42     ` Junio C Hamano
2014-04-06 21:46       ` Kirill Smelkov
2014-04-07 17:29         ` Junio C Hamano
2014-04-07 20:26           ` Kirill Smelkov [this message]
2014-04-07 18:07         ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 19/19] combine-diff: speed it up, by using multiparent diff tree-walker directly Kirill Smelkov
2014-02-24 23:43 ` [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Duy Nguyen
2014-02-25 10:38   ` Kirill Smelkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140407202616.GA4140@mini.zxlink \
    --to=kirr@navytux.spb.ru \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kirr@mns.spb.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.