Re: Tag peeling peculiarities

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
	git discussion list <git@vger.kernel.org>
Subject: Re: Tag peeling peculiarities
Date: Sat, 16 Mar 2013 05:34:41 -0400	[thread overview]
Message-ID: <20130316093441.GA26260@sigill.intra.peff.net> (raw)
In-Reply-To: <514431EA.2050402@alum.mit.edu>

On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:

> My patch series is nearly done.  I will need another day or two to
> review and make it submission-ready, but I wanted to give you an idea of
> what I'm up to and I could also use your feedback on some points.

I was just sending out my cleaned up patches to do do fully-peeled, too.
I think the duplication is OK, though, as your topic would be for
"master" and mine can go to "maint".

> * Implement fully-peeled packed-refs files, as discussed upthread: peel
>   references outside of refs/tags, and keep rigorous track of
>   which references have REF_KNOWS_PEELED.

Looks pretty similar to mine. We even added similar tests.

I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel
line" optimization. I didn't bother, because this will never be
triggered by a git-written file (either we do not write the entry, or we
set fully-peeled). I wonder if any other implementation does peel every
ref, though.

> * Change pack-refs to use the peeled information from ref_entries if it
>   is available, rather than looking up the references again.

I don't know that it matters from a performance standpoint (we don't
really care how long pack-refs takes, as long as it is within reason,
because it is run as part of a gc).  But it would mean that any errors
in the file are propagated from one run of pack-refs to the next. I
don't know if we want to spend the extra time to be more robust.

> * repack_without_ref() writes peeled refs to the packed-refs file.
>   Use the old values if available; otherwise look them up.

Whereas here we probably _do_ want the performance optimization, because
we do care about the performance of a ref deletion.

> 1. There are multiple functions for peeling refs:
>    peel_ref() (and peel_object(), which was extracted from it);
>    peel_entry() (derived from pack-refs.c:pack_one_ref()).  It would be
>    nice to combine these into the One True Function.  But given the
>    problem that you mentioned above (which is rooted in parts of the
>    code that I don't know) I don't know what that version should do.

I'm not sure I understand the question. Just skimming your patches, it
looks like peel_entry could just call peel_object?

> 2. repack_without_ref() now writes peeled refs, peeling them if
>    necessary.  It does this *without* referring to the loose refs
>    to avoid having to load all of the loose references, which can be
>    time-consuming.  But this means that it might try to lookup SHA1s
>    that are not the current value of the corresponding reference any
>    more, and might even have been garbage collected.

Yuck. I really wonder if repack_without_ref should literally just be
writing out the exact same file, minus the lines for the deleted ref.
Is there a reason we need to do any processing at all? I guess the
answer is that you want to avoid re-parsing the current file, and just
write it out from memory. But don't we need to refresh the memory cache
from disk under the lock anyway? That was the pack-refs race that I
fixed recently.

> Is the code that
>    I use to do this, in peel_entry(), safe to call for nonexistent
>    SHA1s (I would like it to return 0 if the SHA1 is invalid)?:
> 
> 	o = parse_object(entry->u.value.sha1);
> 	if (o->type == OBJ_TAG) {
> 		o = deref_tag(o, entry->name, 0);
> 		if (o) {
> 			hashcpy(entry->u.value.peeled, o->sha1);
> 			entry->flag |= REF_KNOWS_PEELED;
> 			return 1;
> 		}
> 	}
> 	return 0;

I think this approach is safe, as parse_object will silently return NULL
for a missing object. You do need to check for "o != NULL" in your
conditional, though.

> 3. This same change to repack_with_ref() means that it could become
>    significantly slower to delete a packed ref if the packed-ref file
>    is not fully-peeled.  On the plus side, once this is done once, the
>    packed-refs files will be rewritten in fully-peeled form.  But if
>    two versions of Git are being used in a repository, this cost could
>    be incurred repeatedly.  Does anybody object?

I think it's OK in concept. But I still am wondering if it would be
simpler still to just pass the file through while locked.

> 4. Should I change the locking policy as discussed in this thread?:
> 
>        http://article.gmane.org/gmane.comp.version-control.git/212131

I think it's overall a sane direction. It does increase lock contention
slightly when two processes are deleting at the same time, but it would
fix the weird corner cases I described (mostly deleted refs reappearing
due to races). And the lock contention is already there in a
fully-packed repo anyway. I.e., right now we read the packed-refs file
and lock it if our to-delete ref is in there; with the proposed change,
we would lock before even reading it. So the increased contention is
only when two deleters race each other, _and_ one of them is not
deleting a packed ref.

-Peff

next prev parent reply	other threads:[~2013-03-16  9:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty
2013-03-13 15:34 ` Michael Haggerty
2013-03-13 17:29 ` Junio C Hamano
2013-03-13 21:58   ` Jeff King
2013-03-14  4:41     ` Michael Haggerty
2013-03-14  5:24       ` Jeff King
2013-03-14  5:32         ` Jeff King
2013-03-14 15:14           ` Junio C Hamano
2013-03-14 11:28         ` Michael Haggerty
2013-03-14 13:40           ` Jeff King
2013-03-15  5:12             ` Michael Haggerty
2013-03-15 16:28               ` Junio C Hamano
2013-03-16  8:48             ` Michael Haggerty
2013-03-16  9:34               ` Jeff King [this message]
2013-03-16 13:38                 ` Michael Haggerty
2013-03-18  3:17                   ` Michael Haggerty
2013-03-22 17:42                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130316093441.GA26260@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).