From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
git discussion list <git@vger.kernel.org>
Subject: Re: Tag peeling peculiarities
Date: Sat, 16 Mar 2013 05:34:41 -0400 [thread overview]
Message-ID: <20130316093441.GA26260@sigill.intra.peff.net> (raw)
In-Reply-To: <514431EA.2050402@alum.mit.edu>
On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote:
> My patch series is nearly done. I will need another day or two to
> review and make it submission-ready, but I wanted to give you an idea of
> what I'm up to and I could also use your feedback on some points.
I was just sending out my cleaned up patches to do do fully-peeled, too.
I think the duplication is OK, though, as your topic would be for
"master" and mine can go to "maint".
> * Implement fully-peeled packed-refs files, as discussed upthread: peel
> references outside of refs/tags, and keep rigorous track of
> which references have REF_KNOWS_PEELED.
Looks pretty similar to mine. We even added similar tests.
I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel
line" optimization. I didn't bother, because this will never be
triggered by a git-written file (either we do not write the entry, or we
set fully-peeled). I wonder if any other implementation does peel every
ref, though.
> * Change pack-refs to use the peeled information from ref_entries if it
> is available, rather than looking up the references again.
I don't know that it matters from a performance standpoint (we don't
really care how long pack-refs takes, as long as it is within reason,
because it is run as part of a gc). But it would mean that any errors
in the file are propagated from one run of pack-refs to the next. I
don't know if we want to spend the extra time to be more robust.
> * repack_without_ref() writes peeled refs to the packed-refs file.
> Use the old values if available; otherwise look them up.
Whereas here we probably _do_ want the performance optimization, because
we do care about the performance of a ref deletion.
> 1. There are multiple functions for peeling refs:
> peel_ref() (and peel_object(), which was extracted from it);
> peel_entry() (derived from pack-refs.c:pack_one_ref()). It would be
> nice to combine these into the One True Function. But given the
> problem that you mentioned above (which is rooted in parts of the
> code that I don't know) I don't know what that version should do.
I'm not sure I understand the question. Just skimming your patches, it
looks like peel_entry could just call peel_object?
> 2. repack_without_ref() now writes peeled refs, peeling them if
> necessary. It does this *without* referring to the loose refs
> to avoid having to load all of the loose references, which can be
> time-consuming. But this means that it might try to lookup SHA1s
> that are not the current value of the corresponding reference any
> more, and might even have been garbage collected.
Yuck. I really wonder if repack_without_ref should literally just be
writing out the exact same file, minus the lines for the deleted ref.
Is there a reason we need to do any processing at all? I guess the
answer is that you want to avoid re-parsing the current file, and just
write it out from memory. But don't we need to refresh the memory cache
from disk under the lock anyway? That was the pack-refs race that I
fixed recently.
> Is the code that
> I use to do this, in peel_entry(), safe to call for nonexistent
> SHA1s (I would like it to return 0 if the SHA1 is invalid)?:
>
> o = parse_object(entry->u.value.sha1);
> if (o->type == OBJ_TAG) {
> o = deref_tag(o, entry->name, 0);
> if (o) {
> hashcpy(entry->u.value.peeled, o->sha1);
> entry->flag |= REF_KNOWS_PEELED;
> return 1;
> }
> }
> return 0;
I think this approach is safe, as parse_object will silently return NULL
for a missing object. You do need to check for "o != NULL" in your
conditional, though.
> 3. This same change to repack_with_ref() means that it could become
> significantly slower to delete a packed ref if the packed-ref file
> is not fully-peeled. On the plus side, once this is done once, the
> packed-refs files will be rewritten in fully-peeled form. But if
> two versions of Git are being used in a repository, this cost could
> be incurred repeatedly. Does anybody object?
I think it's OK in concept. But I still am wondering if it would be
simpler still to just pass the file through while locked.
> 4. Should I change the locking policy as discussed in this thread?:
>
> http://article.gmane.org/gmane.comp.version-control.git/212131
I think it's overall a sane direction. It does increase lock contention
slightly when two processes are deleting at the same time, but it would
fix the weird corner cases I described (mostly deleted refs reappearing
due to races). And the lock contention is already there in a
fully-packed repo anyway. I.e., right now we read the packed-refs file
and lock it if our to-delete ref is in there; with the proposed change,
we would lock before even reading it. So the increased contention is
only when two deleters race each other, _and_ one of them is not
deleting a packed ref.
-Peff
next prev parent reply other threads:[~2013-03-16 9:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty
2013-03-13 15:34 ` Michael Haggerty
2013-03-13 17:29 ` Junio C Hamano
2013-03-13 21:58 ` Jeff King
2013-03-14 4:41 ` Michael Haggerty
2013-03-14 5:24 ` Jeff King
2013-03-14 5:32 ` Jeff King
2013-03-14 15:14 ` Junio C Hamano
2013-03-14 11:28 ` Michael Haggerty
2013-03-14 13:40 ` Jeff King
2013-03-15 5:12 ` Michael Haggerty
2013-03-15 16:28 ` Junio C Hamano
2013-03-16 8:48 ` Michael Haggerty
2013-03-16 9:34 ` Jeff King [this message]
2013-03-16 13:38 ` Michael Haggerty
2013-03-18 3:17 ` Michael Haggerty
2013-03-22 17:42 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130316093441.GA26260@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).