git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Smelkov <kirr@navytux.spb.ru>
To: Junio C Hamano <gitster@pobox.com>
Cc: kirr@mns.spb.ru, git@vger.kernel.org
Subject: Re: [PATCH 12/19] tree-diff: remove special-case diff-emitting code for empty-tree cases
Date: Wed, 26 Mar 2014 22:32:30 +0400	[thread overview]
Message-ID: <20140326183230.GA16002@mini.zxlink> (raw)
In-Reply-To: <xmqqzjkej5ju.fsf@gitster.dls.corp.google.com> <xmqq8urymaua.fsf@gitster.dls.corp.google.com>

On Tue, Mar 25, 2014 at 10:45:01AM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
> 
> >> >  static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
> >> >  {
> >> >  	struct name_entry *e1, *e2;
> >> >  	int cmp;
> >> >  
> >> > +	if (!t1->size)
> >> > +		return t2->size ? +1 /* +∞ > c */  : 0 /* +∞ = +∞ */;
> >> > +	else if (!t2->size)
> >> > +		return -1;	/* c < +∞ */
> >> 
> >> Where do these "c" come from?  I somehow feel that these comments
> >> are making it harder to understand what is going on.
> >
> > "c" means some finite "c"onstant here. When I was studying at school and
> > at the university, it was common to denote constants via this letter -
> > i.e. in algebra and operators they often show scalar multiplication as
> >
> >     c·A     (or α·A)
> >
> > etc. I understand it could maybe be confusing (but it came to me as
> > surprise), so would the following be maybe better:
> >
> >         if (!t1->size)
> >         	return t2->size ? +1 /* +∞ > const */  : 0 /* +∞ = +∞ */;
> >         else if (!t2->size)
> >         	return -1;	/* const < +∞ */
> >
> > ?
> 
> Not better at all, I am afraid.  A "const" in the code usually means
> "something that does not change, as opposed to a variable", but what
> you are saying here is "t1 does not have an element but t2 still
> does. Pretend as if t1 has a virtual/fake element that is larger
> than any real element t2 may happen to have at the head of its
> queue", and you are labeling that "real element at the head of t2"
> as "const", but as the walker advances, the head element in t1 and
> t2 will change---they are not "const" in that sense, and the reader
> is left scratching his head seeing "const" there, wondering what the
> author of the comment meant.

I agree.


> "real" or "concrete" might be better a phrasing, but I do not think
> having "/* +inf > concrete */" there helps the reader understand
> what is going on in the first place.  Perhaps:
> 
>         /*
>          * When one side is empty, pretend that it has an element
>          * that sorts later than what the other non-empty side has,
>          * so that the caller advances the non-empty side without
>          * touching the empty side.
>          */
>         if (!t1->size)
>                 return !t2->size ? 0 : 1;
>         else if (!t2->size)
>                 return -1;
> 
> or something?

Yes, that describe the reasoning without stranger symbols. How about
taking it further with

          * NOTE empty (=invalid) descriptor(s) take part in comparison as +infty,
          *      so that they sort *after* valid tree entries.
          *
          *      Due to this convention, if trees are scanned in sorted order, all
          *      non-empty descriptors will be processed first.
          */
         static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
         {
                struct name_entry *e1, *e2;
                int cmp;
         
                /* empty descriptors sort after valid tree entries */
                if (!t1->size)
                        return t2->size ? +1 : 0;
                else if (!t2->size)
                        return -1;

?

On Tue, Mar 25, 2014 at 03:07:33PM -0700, Junio C Hamano wrote:
> Kirill Smelkov <kirr@navytux.spb.ru> writes:
> 
> > On Mon, Mar 24, 2014 at 02:18:10PM -0700, Junio C Hamano wrote:
> >> Kirill Smelkov <kirr@mns.spb.ru> writes:
> >> 
> >> > via teaching tree_entry_pathcmp() how to compare empty tree descriptors:
> >> 
> >> Drop this line, as you explain the "pretend empty compares bigger
> >> than anything else" idea later anyway?  This early part of the
> >> proposed log message made me hiccup while reading it.
> >
> > Hmm, I was trying to show the big picture first and only then details...
> 
> The subject should be sufficient for the big picture.  "OK, we are
> removing the special casing" is what we expect the reader to get.
> Then, this
> 
> >> > While walking trees, we iterate their entries from lowest to highest in
> >> > sort order, so empty tree means all entries were already went over.
> 
> sets the background.  "OK, the code walks two trees, both have
> sorted elements, in parallel." is what we want the reader to
> understand.  Then the next part gives the idea of pretending that
> the empty-side always compare later than the non-empty side while
> doing that parallel walking (similar to "merge").
> 
> So, yes, I think it is a good presentation order to give big picture
> punch-line first on the subject, some background and then the
> solution.

Ok, let it be this way and let's drop it.

Here is updated patch:
(please keep author email)

---- 8< ----
From: Kirill Smelkov <kirr@mns.spb.ru>
Date: Mon, 24 Feb 2014 20:21:44 +0400
Subject: [PATCH v2] tree-diff: remove special-case diff-emitting code for empty-tree cases

While walking trees, we iterate their entries from lowest to highest in
sort order, so empty tree means all entries were already went over.

If we artificially assign +infinity value to such tree "entry", it will
go after all usual entries, and through the usual driver loop we will be
taking the same actions, which were hand-coded for special cases, i.e.

    t1 empty, t2 non-empty
        pathcmp(+∞, t2) -> +1
        show_path(/*t1=*/NULL, t2);     /* = t1 > t2 case in main loop */

    t1 non-empty, t2-empty
        pathcmp(t1, +∞) -> -1
        show_path(t1, /*t2=*/NULL);     /* = t1 < t2 case in main loop */

In other words when we have t1 and t2, we return a sign that tells the
caller to indicate the "earlier" one to be emitted, and by returning the
sign that causes the non-empty side to be emitted, we will automatically
cause the entries from the remaining side to be emitted, without
attempting to touch the empty side at all.  We can teach
tree_entry_pathcmp() to pretend that an empty tree has an element that
sorts after anything else to achieve this.

Right now we never go to when compared tree descriptors are both
infinity, as this condition is checked in the loop beginning as
finishing criteria, but will do so in the future, when there will be
several parents iterated simultaneously, and some pair of them would run
to the end.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
---

Changes since v1:

 - reworked commit log and comments as per Junio suggestions.


 tree-diff.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index f8b2607..6177658 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -12,12 +12,24 @@
  *
  * NOTE files and directories *always* compare differently, even when having
  *      the same name - thanks to base_name_compare().
+ *
+ * NOTE empty (=invalid) descriptor(s) take part in comparison as +infty,
+ *      so that they sort *after* valid tree entries.
+ *
+ *      Due to this convention, if trees are scanned in sorted order, all
+ *      non-empty descriptors will be processed first.
  */
 static int tree_entry_pathcmp(struct tree_desc *t1, struct tree_desc *t2)
 {
 	struct name_entry *e1, *e2;
 	int cmp;
 
+	/* empty descriptors sort after valid tree entries */
+	if (!t1->size)
+		return t2->size ? +1 : 0;
+	else if (!t2->size)
+		return -1;
+
 	e1 = &t1->entry;
 	e2 = &t2->entry;
 	cmp = base_name_compare(e1->path, tree_entry_len(e1), e1->mode,
@@ -150,18 +162,8 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2,
 			skip_uninteresting(t1, &base, opt);
 			skip_uninteresting(t2, &base, opt);
 		}
-		if (!t1->size) {
-			if (!t2->size)
-				break;
-			show_path(&base, opt, /*t1=*/NULL, t2);
-			update_tree_entry(t2);
-			continue;
-		}
-		if (!t2->size) {
-			show_path(&base, opt, t1, /*t2=*/NULL);
-			update_tree_entry(t1);
-			continue;
-		}
+		if (!t1->size && !t2->size)
+			break;
 
 		cmp = tree_entry_pathcmp(t1, t2);
 
-- 
1.9.rc0.143.g6fd479e

  reply	other threads:[~2014-03-26 18:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-24 16:21 [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Kirill Smelkov
2014-02-24 16:21 ` [PATCH 01/19] combine-diff: move show_log_first logic/action out of paths scanning Kirill Smelkov
2014-02-24 16:21 ` [PATCH 02/19] combine-diff: move changed-paths scanning logic into its own function Kirill Smelkov
2014-02-24 16:21 ` [PATCH 03/19] tree-diff: no need to manually verify that there is no mode change for a path Kirill Smelkov
2014-02-24 16:21 ` [PATCH 04/19] tree-diff: no need to pass match to skip_uninteresting() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 05/19] tree-diff: show_tree() is not needed Kirill Smelkov
2014-02-24 16:21 ` [PATCH 06/19] tree-diff: consolidate code for emitting diffs and recursion in one place Kirill Smelkov
2014-02-24 16:21 ` [PATCH 07/19] tree-diff: don't assume compare_tree_entry() returns -1,0,1 Kirill Smelkov
2014-02-24 16:21 ` [PATCH 08/19] tree-diff: move all action-taking code out of compare_tree_entry() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 09/19] tree-diff: rename compare_tree_entry -> tree_entry_pathcmp Kirill Smelkov
2014-02-24 16:21 ` [PATCH 10/19] tree-diff: show_path prototype is not needed anymore Kirill Smelkov
2014-02-24 16:21 ` [PATCH 11/19] tree-diff: simplify tree_entry_pathcmp Kirill Smelkov
2014-03-24 21:25   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 12/19] tree-diff: remove special-case diff-emitting code for empty-tree cases Kirill Smelkov
2014-03-24 21:18   ` Junio C Hamano
2014-03-25  9:20     ` Kirill Smelkov
2014-03-25 17:45       ` Junio C Hamano
2014-03-26 18:32         ` Kirill Smelkov [this message]
2014-03-25 22:07       ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 13/19] tree-diff: diff_tree() should now be static Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 14/19] tree-diff: rework diff_tree interface to be sha1 based Kirill Smelkov
2014-03-24 21:36   ` Junio C Hamano
2014-03-25  9:22     ` Kirill Smelkov
2014-03-25 17:46       ` Junio C Hamano
2014-03-26 19:52         ` Kirill Smelkov
2014-03-26 21:34           ` Junio C Hamano
2014-03-27 14:24             ` Kirill Smelkov
2014-03-27 18:48               ` Junio C Hamano
2014-03-27 19:43                 ` Kirill Smelkov
2014-03-28  6:52                 ` Johannes Sixt
2014-03-28 17:06                   ` Junio C Hamano
2014-03-28 17:46                     ` Johannes Sixt
2014-03-28 18:36                       ` Junio C Hamano
2014-03-28 19:08                         ` Johannes Sixt
2014-03-28 19:27                           ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 15/19] tree-diff: no need to call "full" diff_tree_sha1 from show_path() Kirill Smelkov
2014-03-27 14:21   ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 16/19] tree-diff: reuse base str(buf) memory on sub-tree recursion Kirill Smelkov
2014-03-24 21:43   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-03-27 14:22       ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 17/19] Portable alloca for Git Kirill Smelkov
2014-02-28 10:58   ` Thomas Schwinge
2014-02-28 13:44   ` Erik Faye-Lund
2014-02-28 13:50     ` Erik Faye-Lund
2014-02-28 17:00       ` Kirill Smelkov
2014-02-28 17:19         ` Erik Faye-Lund
2014-03-05  9:31           ` Kirill Smelkov
2014-03-24 21:47             ` Junio C Hamano
2014-03-27 14:22               ` Kirill Smelkov
2014-04-09 12:48                 ` Kirill Smelkov
2014-04-09 13:01                   ` Erik Faye-Lund
2014-04-10 17:30                     ` Junio C Hamano
2014-02-24 16:21 ` [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Kirill Smelkov
2014-03-27 14:23   ` Kirill Smelkov
2014-04-04 18:42     ` Junio C Hamano
2014-04-06 21:46       ` Kirill Smelkov
2014-04-07 17:29         ` Junio C Hamano
2014-04-07 20:26           ` Kirill Smelkov
2014-04-07 18:07         ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 19/19] combine-diff: speed it up, by using multiparent diff tree-walker directly Kirill Smelkov
2014-02-24 23:43 ` [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Duy Nguyen
2014-02-25 10:38   ` Kirill Smelkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140326183230.GA16002@mini.zxlink \
    --to=kirr@navytux.spb.ru \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kirr@mns.spb.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).