All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Kirill Smelkov <kirr@navytux.spb.ru>
Cc: kirr@mns.spb.ru, git@vger.kernel.org
Subject: Re: [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Date: Fri, 04 Apr 2014 11:42:39 -0700	[thread overview]
Message-ID: <xmqqppkxos0w.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140327142354.GD17333@mini.zxlink> (Kirill Smelkov's message of "Thu, 27 Mar 2014 18:23:54 +0400")

Kirill Smelkov <kirr@navytux.spb.ru> writes:

> +extern
> +struct combine_diff_path *diff_tree_paths(

These two on the same line, please.

> +	struct combine_diff_path *p, const unsigned char *sha1,
> +	const unsigned char **parent_sha1, int nparent,
> +	struct strbuf *base, struct diff_options *opt);
>  extern int diff_tree_sha1(const unsigned char *old, const unsigned char *new,
>  			  const char *base, struct diff_options *opt);
> ...
> +/*
> + * convert path -> opt->diff_*() callbacks
> + *
> + * emits diff to first parent only, and tells diff tree-walker that we are done
> + * with p and it can be freed.
> + */
> +static int emit_diff_first_parent_only(struct diff_options *opt, struct combine_diff_path *p)
>  {

Very straight-forward; good.

> +static struct combine_diff_path *path_appendnew(struct combine_diff_path *last,
> +	int nparent, const struct strbuf *base, const char *path, int pathlen,
> +	unsigned mode, const unsigned char *sha1)
> +{
> +	struct combine_diff_path *p;
> +	int len = base->len + pathlen;
> +	int alloclen = combine_diff_path_size(nparent, len);
> +
> +	/* if last->next is !NULL - it is a pre-allocated memory, we can reuse */
> +	p = last->next;
> +	if (p && (alloclen > (intptr_t)p->next)) {
> +		free(p);
> +		p = NULL;
> +	}
> +
> +	if (!p) {
> +		p = xmalloc(alloclen);
> +
> +		/*
> +		 * until we go to it next round, .next holds how many bytes we
> +		 * allocated (for faster realloc - we don't need copying old data).
> +		 */
> +		p->next = (struct combine_diff_path *)(intptr_t)alloclen;

This reuse of the .next field is somewhat yucky, but it is very
localized inside a function that has a single callsite to this
function, so let's let it pass.

> +static struct combine_diff_path *emit_path(struct combine_diff_path *p,
> +	struct strbuf *base, struct diff_options *opt, int nparent,
> +	struct tree_desc *t, struct tree_desc *tp,
> +	int imin)
>  {

Again, fairly straight-forward and good.

> +/*
> + * generate paths for combined diff D(sha1,parents_sha1[])
> + ...
> +static struct combine_diff_path *ll_diff_tree_paths(
> +	struct combine_diff_path *p, const unsigned char *sha1,
> +	const unsigned char **parents_sha1, int nparent,
> +	struct strbuf *base, struct diff_options *opt)
> +{
> +	struct tree_desc t, *tp;
> +	void *ttree, **tptree;
> +	int i;
> +
> +	tp     = xalloca(nparent * sizeof(tp[0]));
> +	tptree = xalloca(nparent * sizeof(tptree[0]));
> +
> +	/*
> +	 * load parents first, as they are probably already cached.
> +	 *
> +	 * ( log_tree_diff() parses commit->parent before calling here via
> +	 *   diff_tree_sha1(parent, commit) )
> +	 */
> +	for (i = 0; i < nparent; ++i)
> +		tptree[i] = fill_tree_descriptor(&tp[i], parents_sha1[i]);
> +	ttree = fill_tree_descriptor(&t, sha1);
>  
>  	/* Enable recursion indefinitely */
>  	opt->pathspec.recursive = DIFF_OPT_TST(opt, RECURSIVE);
>  
>  	for (;;) {
> -		int cmp;
> +		int imin, cmp;
>  
>  		if (diff_can_quit_early(opt))
>  			break;
> +
>  		if (opt->pathspec.nr) {
> -			skip_uninteresting(&t1, base, opt);
> -			skip_uninteresting(&t2, base, opt);
> +			skip_uninteresting(&t, base, opt);
> +			for (i = 0; i < nparent; i++)
> +				skip_uninteresting(&tp[i], base, opt);
>  		}
> -		if (!t1.size && !t2.size)
> -			break;
>  
> -		cmp = tree_entry_pathcmp(&t1, &t2);
> +		/* comparing is finished when all trees are done */
> +		if (!t.size) {
> +			int done = 1;
> +			for (i = 0; i < nparent; ++i)
> +				if (tp[i].size) {
> +					done = 0;
> +					break;
> +				}
> +			if (done)
> +				break;
> +		}
> +
> +		/*
> +		 * lookup imin = argmin(x1...xn),
> +		 * mark entries whether they =tp[imin] along the way
> +		 */
> +		imin = 0;
> +		tp[0].entry.mode &= ~S_IFXMIN_NEQ;
> +
> +		for (i = 1; i < nparent; ++i) {
> +			cmp = tree_entry_pathcmp(&tp[i], &tp[imin]);
> +			if (cmp < 0) {
> +				imin = i;
> +				tp[i].entry.mode &= ~S_IFXMIN_NEQ;
> +			}
> +			else if (cmp == 0) {
> +				tp[i].entry.mode &= ~S_IFXMIN_NEQ;
> +			}
> +			else {
> +				tp[i].entry.mode |= S_IFXMIN_NEQ;
> +			}
> +		}
> +
> +		/* fixup markings for entries before imin */
> +		for (i = 0; i < imin; ++i)
> +			tp[i].entry.mode |= S_IFXMIN_NEQ;	/* x[i] > x[imin] */
> +

These two loop made my reading hiccup for a while.  With these you
are scanning the tp[] array 1.5 times (and doing the bitwise
assignment to entry.mode 1.5 * nparent times), but I suspect it may
have been a lot easier to read if the first loop only identified the
imin, and the second loop only did the entry.mode for _all_ nparents.

> +		/* compare a vs x[imin] */
> +		cmp = tree_entry_pathcmp(&t, &tp[imin]);
> +
> +		/* a = xi */
> +		if (cmp == 0) {
> +			/* are either xk > xi or diff(a,xk) != ø ? */
> +			if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER)) {
> +				for (i = 0; i < nparent; ++i) {
> +					/* x[i] > x[imin] */
> +					if (tp[i].entry.mode & S_IFXMIN_NEQ)
> +						continue;
> +
> +					/* diff(a,xk) != ø */
> +					if (hashcmp(t.entry.sha1, tp[i].entry.sha1) ||
> +					    (t.entry.mode != tp[i].entry.mode))
> +						continue;
> +
> +					goto skip_emit_t_tp;
> +				}
> +			}

Please bear with me.  The notation scares me as I am not good at math.

In short, the above loop is about:

    We are looking at path in 't' and some parents have the same
    path.  If any of these parents have that path with the contents
    identical to 't', then do not emit this path.

which makes sense to me, but these notation also made my reading
hiccup, especially because it is hard to guess what "xk" refers to
(e.g. "any k where 0 <= k < nparent && i != k"? "all such k"?).  I
still haven't figured out what you meant to say with "xk", but I
think I got what the code wants to do.

How does the "the (virtual) path from a tree that has ran out of
entries sorts later than anything else" comparison rule influence
the picture?  A parent that has ran out would have _NEQ bit set and
would not count as having the same contents as the path from 't'.
If 't' has ran out, the only way t and tp[imin] could compare equal
is when tp[imin] has also ran out, but that can happen only when all
the parents are done with, so we would have broken out of the loop
even before we try to figure out imin.  So there is no funnies
there, which is good.

> +			/* D += {δ(a,xk) if xk=xi;  "+a" if xk > xi} */
> +			p = emit_path(p, base, opt, nparent,
> +					&t, tp, imin);
> +
> +		skip_emit_t_tp:
> +			/* a↓,  ∀ xk=ximin  xk↓ */
> +			update_tree_entry(&t);
> +			update_tp_entries(tp, nparent);
>  		}
>  
> -		/* t1 < t2 */
> +		/* a < xi */
>  		else if (cmp < 0) {
> -			show_path(base, opt, &t1, /*t2=*/NULL);
> -			update_tree_entry(&t1);
> +			/* D += "+a" */
> +			p = emit_path(p, base, opt, nparent,
> +					&t, /*tp=*/NULL, -1);
> +
> +			/* a↓ */
> +			update_tree_entry(&t);

This is straight-forward.  No parent has path 't' has, so only the
entry from 't' is given, and we deal with the next entry in 't'
without touching any of the parents in the next iteration.  Good.

>  		}
>  
> -		/* t1 > t2 */
> +		/* a > xi */
>  		else {
> -			show_path(base, opt, /*t1=*/NULL, &t2);
> -			update_tree_entry(&t2);
> +			/* ∀j xj=ximin -> D += "-xi" */

Did you mean "-xj"?

> +			if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER)) {
> +				for (i = 0; i < nparent; ++i)
> +					if (tp[i].entry.mode & S_IFXMIN_NEQ)
> +						goto skip_emit_tp;
> +			}
> +
> +			p = emit_path(p, base, opt, nparent,
> +					/*t=*/NULL, tp, imin);
> +
> +		skip_emit_tp:
> +			/* ∀ xk=ximin  xk↓ */
> +			update_tp_entries(tp, nparent);

There are parents whose path sort earlier than what is in 't'
(i.e. they were lost in the result---we would want to show
removal).  What makes us jump to the skip label?

    We are looking at path in 't', and some parents have paths that
    sort earlier than that path.  We will not go to skip label if
    any one of the parent's entry sorts after some other parent (or
    the parent in question has ran out its entries), which means we
    show the entry from the parents only when all the parents have
    that same path, which is missing from 't'.

I am not sure if I am reading this correctly, though.

For the two-way diff, the above degenerates to "show all parent
entries that come before the first entry in 't'", which is correct.
For the combined diff, the current intersect_paths() makes sure that
each path appears in all the pair-wise diff between t and tp[],
which again means that the above logic match the current behaviour.


> +struct combine_diff_path *diff_tree_paths(
> +	struct combine_diff_path *p, const unsigned char *sha1,
> +	const unsigned char **parents_sha1, int nparent,
> +	struct strbuf *base, struct diff_options *opt)
> +{
> +	p = ll_diff_tree_paths(p, sha1, parents_sha1, nparent, base, opt);
> +
> +	/*
> +	 * free pre-allocated last element, if any
> +	 * (see path_appendnew() for details about why)
> +	 */
> +	if (p->next) {
> +		free(p->next);
> +		p->next = NULL;
> +	}
> +
> +	return p;
>  }
>  
>  /*
> @@ -308,6 +664,27 @@ static void try_to_follow_renames(const unsigned char *old, const unsigned char
>  	q->nr = 1;
>  }
>  
> +static int ll_diff_tree_sha1(const unsigned char *old, const unsigned char *new,
> +			     struct strbuf *base, struct diff_options *opt)
> +{
> +	struct combine_diff_path phead, *p;
> +	const unsigned char *parents_sha1[1] = {old};
> +	pathchange_fn_t pathchange_old = opt->pathchange;
> +
> +	phead.next = NULL;
> +	opt->pathchange = emit_diff_first_parent_only;
> +	diff_tree_paths(&phead, new, parents_sha1, 1, base, opt);

Hmph.  I would have expected

	const unsigned char **parents_sha1 = &old;

or even

	diff_tree_paths(&phead, new, &old, 1, base, opt);

here.


Thanks.

  reply	other threads:[~2014-04-04 18:42 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-24 16:21 [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Kirill Smelkov
2014-02-24 16:21 ` [PATCH 01/19] combine-diff: move show_log_first logic/action out of paths scanning Kirill Smelkov
2014-02-24 16:21 ` [PATCH 02/19] combine-diff: move changed-paths scanning logic into its own function Kirill Smelkov
2014-02-24 16:21 ` [PATCH 03/19] tree-diff: no need to manually verify that there is no mode change for a path Kirill Smelkov
2014-02-24 16:21 ` [PATCH 04/19] tree-diff: no need to pass match to skip_uninteresting() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 05/19] tree-diff: show_tree() is not needed Kirill Smelkov
2014-02-24 16:21 ` [PATCH 06/19] tree-diff: consolidate code for emitting diffs and recursion in one place Kirill Smelkov
2014-02-24 16:21 ` [PATCH 07/19] tree-diff: don't assume compare_tree_entry() returns -1,0,1 Kirill Smelkov
2014-02-24 16:21 ` [PATCH 08/19] tree-diff: move all action-taking code out of compare_tree_entry() Kirill Smelkov
2014-02-24 16:21 ` [PATCH 09/19] tree-diff: rename compare_tree_entry -> tree_entry_pathcmp Kirill Smelkov
2014-02-24 16:21 ` [PATCH 10/19] tree-diff: show_path prototype is not needed anymore Kirill Smelkov
2014-02-24 16:21 ` [PATCH 11/19] tree-diff: simplify tree_entry_pathcmp Kirill Smelkov
2014-03-24 21:25   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 12/19] tree-diff: remove special-case diff-emitting code for empty-tree cases Kirill Smelkov
2014-03-24 21:18   ` Junio C Hamano
2014-03-25  9:20     ` Kirill Smelkov
2014-03-25 17:45       ` Junio C Hamano
2014-03-26 18:32         ` Kirill Smelkov
2014-03-25 22:07       ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 13/19] tree-diff: diff_tree() should now be static Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 14/19] tree-diff: rework diff_tree interface to be sha1 based Kirill Smelkov
2014-03-24 21:36   ` Junio C Hamano
2014-03-25  9:22     ` Kirill Smelkov
2014-03-25 17:46       ` Junio C Hamano
2014-03-26 19:52         ` Kirill Smelkov
2014-03-26 21:34           ` Junio C Hamano
2014-03-27 14:24             ` Kirill Smelkov
2014-03-27 18:48               ` Junio C Hamano
2014-03-27 19:43                 ` Kirill Smelkov
2014-03-28  6:52                 ` Johannes Sixt
2014-03-28 17:06                   ` Junio C Hamano
2014-03-28 17:46                     ` Johannes Sixt
2014-03-28 18:36                       ` Junio C Hamano
2014-03-28 19:08                         ` Johannes Sixt
2014-03-28 19:27                           ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 15/19] tree-diff: no need to call "full" diff_tree_sha1 from show_path() Kirill Smelkov
2014-03-27 14:21   ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH v2 16/19] tree-diff: reuse base str(buf) memory on sub-tree recursion Kirill Smelkov
2014-03-24 21:43   ` Junio C Hamano
2014-03-25  9:23     ` Kirill Smelkov
2014-03-27 14:22       ` Kirill Smelkov
2014-02-24 16:21 ` [PATCH 17/19] Portable alloca for Git Kirill Smelkov
2014-02-28 10:58   ` Thomas Schwinge
2014-02-28 13:44   ` Erik Faye-Lund
2014-02-28 13:50     ` Erik Faye-Lund
2014-02-28 17:00       ` Kirill Smelkov
2014-02-28 17:19         ` Erik Faye-Lund
2014-03-05  9:31           ` Kirill Smelkov
2014-03-24 21:47             ` Junio C Hamano
2014-03-27 14:22               ` Kirill Smelkov
2014-04-09 12:48                 ` Kirill Smelkov
2014-04-09 13:01                   ` Erik Faye-Lund
2014-04-10 17:30                     ` Junio C Hamano
2014-02-24 16:21 ` [PATCH v2 18/19] tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Kirill Smelkov
2014-03-27 14:23   ` Kirill Smelkov
2014-04-04 18:42     ` Junio C Hamano [this message]
2014-04-06 21:46       ` Kirill Smelkov
2014-04-07 17:29         ` Junio C Hamano
2014-04-07 20:26           ` Kirill Smelkov
2014-04-07 18:07         ` Junio C Hamano
2014-02-24 16:21 ` [PATCH 19/19] combine-diff: speed it up, by using multiparent diff tree-walker directly Kirill Smelkov
2014-02-24 23:43 ` [PATCH v2 00/19] Multiparent diff tree-walker + combine-diff speedup Duy Nguyen
2014-02-25 10:38   ` Kirill Smelkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqppkxos0w.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=kirr@mns.spb.ru \
    --cc=kirr@navytux.spb.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.