All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>, git@vger.kernel.org
Subject: Re: [RFC PATCH] Not computing changed path filter for root commits
Date: Mon, 9 Oct 2023 13:20:31 -0400	[thread overview]
Message-ID: <ZSQ2XwbTM4DDLfJq@nand.local> (raw)
In-Reply-To: <20231002225546.409837-1-jonathantanmy@google.com>

On Mon, Oct 02, 2023 at 03:55:46PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > diff --git a/revision.c b/revision.c
> > index 2f4c53ea20..1d36df49e2 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -837,14 +837,24 @@ static int rev_compare_tree(struct rev_info *revs,
> >  static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
> >  {
> >  	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
> > +	int bloom_ret = 1;
> >
> >  	if (!t1)
> >  		return 0;
> >
> > +	if (revs->bloom_keys_nr) {
> > +		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
> > +		if (!bloom_ret)
> > +			return 1;
> > +	}
> > +
> >  	tree_difference = REV_TREE_SAME;
> >  	revs->pruning.flags.has_changes = 0;
> >  	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
> >
> > +	if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
> > +		count_bloom_filter_false_positive++;
> > +
> >  	return tree_difference == REV_TREE_SAME;
> >  }
>
> I'll concentrate on getting this patch in, and will look at (and
> discuss) the other Bloom filter-related emails later.

Sounds good. I know that I have some pending mail from SZEDER as well,
so I'll try and apply any feedback from both of you before sending a
reroll so that we can get this all done in one shot.

> This looks good, possibly except a code path in try_to_simplify_commit()
> that calls this rev_same_tree_as_empty() function when
> rev_compare_tree() between a commit and its parent returns REV_TREE_NEW.
> So there are 2 issues: How can rev_compare_tree() ever return
> REV_TREE_NEW? And it doesn't seem right to check Bloom filters in this
> code path, since rev_same_tree_as_empty() was invoked here while we are
> enumerating through a commit's parents, which necessarily implies that
> the commit has parents, but here we're using the Bloom filter as if the
> commit is known to have no parents.

> As for the first issue, rev_compare_tree() returns REV_TREE_NEW when the
> parent's tree is NULL. I'm not sure how this can happen - the tree can
> be NULL if the parent commit is not parsed, but at this point I think
> that it has been parsed. And I think every commit has a tree. This goes
> back all the way to 3a5e860815 (revision: make tree comparison functions
> take commits rather than trees, 2008-11-03) and even beyond that (I
> didn't dig further).

The more I think about this, the more confused I become ;-). I was able
to track this all the way back to Linus's 461cf59f89 (rev-list: stop
when the file disappears, 2006-01-18), but am convinced that both of
these cases (we have an analogous one for when t2 is NULL and we return
REV_TREE_OLD) are dead code.

I replaced the "if (!t1) return REV_TREE_NEW;" with "if (!t1)
BUG("oops")", and was able to get the whole test suite to pass. So... I
am pretty sure that this is dead code, but not sure enough to remove it
myself ;-).

To your point about seeing the tree as NULL before parsing the parent, I
don't think that is the case here, since we parse the parent immediately
before calling rev_compare_tree() (and indeed Git will refuse to read
commit objects which do not list a tree).

> As for the second issue, we can probably solve this by being defensive
> in rev_same_tree_as_empty() by only using the Bloom filter when the
> commit has no parents. Not sure if this is being overly defensive,
> though.

I am also unsure whether we are being overly defensive here or not. But
I agree that it does feel safer to apply something like:

--- 8< ---
diff --git a/revision.c b/revision.c
index 3d78ea6a9a..21b3085465 100644
--- a/revision.c
+++ b/revision.c
@@ -834,7 +834,8 @@ static int rev_compare_tree(struct rev_info *revs,
 	return tree_difference;
 }

-static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
+static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit,
+				  int use_bloom_filter)
 {
 	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
 	int bloom_ret = 1;
@@ -842,7 +843,7 @@ static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
 	if (!t1)
 		return 0;

-	if (revs->bloom_keys_nr) {
+	if (use_bloom_filter && revs->bloom_keys_nr) {
 		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
 		if (!bloom_ret)
 			return 1;
@@ -892,7 +893,7 @@ static int compact_treesame(struct rev_info *revs, struct commit *commit, unsign
 		if (nth_parent != 0)
 			die("compact_treesame %u", nth_parent);
 		old_same = !!(commit->object.flags & TREESAME);
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, 1))
 			commit->object.flags |= TREESAME;
 		else
 			commit->object.flags &= ~TREESAME;
@@ -988,7 +989,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		return;

 	if (!commit->parents) {
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, 1))
 			commit->object.flags |= TREESAME;
 		return;
 	}
@@ -1069,7 +1070,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)

 		case REV_TREE_NEW:
 			if (revs->remove_empty_trees &&
-			    rev_same_tree_as_empty(revs, p)) {
+			    rev_same_tree_as_empty(revs, p, 0)) {
 				/* We are adding all the specified
 				 * paths from this parent, so the
 				 * history beyond this parent is not
--- >8 ---

on top.

> There is also the issue that count_bloom_filter_false_positive is
> incremented even when no Bloom filters are present, but I think this is
> fine (it matches the behavior of rev_compare_tree()).

Yep, agreed.

Thanks,
Taylor

  reply	other threads:[~2023-10-09 17:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
2023-09-15 18:25 ` Junio C Hamano
2023-09-15 20:29 ` SZEDER Gábor
2023-09-19 18:19   ` Taylor Blau
2023-10-02 22:55     ` Jonathan Tan
2023-10-09 17:20       ` Taylor Blau [this message]
2023-10-09 17:26         ` Taylor Blau
2023-10-09 20:59           ` Jonathan Tan
2023-10-10 19:47             ` Taylor Blau
2023-09-19 18:21 ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSQ2XwbTM4DDLfJq@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.