* Tag peeling peculiarities @ 2013-03-13 14:59 Michael Haggerty 2013-03-13 15:34 ` Michael Haggerty 2013-03-13 17:29 ` Junio C Hamano 0 siblings, 2 replies; 17+ messages in thread From: Michael Haggerty @ 2013-03-13 14:59 UTC (permalink / raw) To: git discussion list; +Cc: Junio C Hamano I have been working on the pack-refs code [1] and noticed what looks like a problem with the handling of peeled refs in the packed-refs file and in the reference cache. In particular, the peeled versions of tags outside of refs/tags are *not* stored in packed-refs, but after the packed-refs file is read it is assumed that such tags cannot be peeled. It is clear that annotated tags want to live under refs/tags, but there are some ways to create them in other places (see below). It is not clear to me whether the prohibition of tags outside of refs/tags should be made more airtight or whether the peeling of tags outside of refs/tags should be fixed. Example: ~/tmp$ git init foo Initialized empty Git repository in /home/mhagger/tmp/foo/.git/ ~/tmp$ cd foo ~/tmp/foo$ git config user.name 'Lou User' ~/tmp/foo$ git config user.email 'luser@exaple.com' ~/tmp/foo$ ~/tmp/foo$ git commit --allow-empty -m "Initial commit" [master (root-commit) 7e80ddd] Initial commit ~/tmp/foo$ git tag -m footag footag ~/tmp/foo$ ~/tmp/foo$ # This is prohibited: ~/tmp/foo$ git update-ref refs/heads/foobranch $(git rev-parse footag) error: Trying to write non-commit object d9cdc84dd156ff83799f5226794711fbb2c8273a to branch refs/heads/foobranch fatal: Cannot update the ref 'refs/heads/foobranch'. ~/tmp/foo$ ~/tmp/foo$ # But this is allowed: ~/tmp/foo$ git update-ref refs/remotes/foo/bar $(git rev-parse footag) ~/tmp/foo$ ~/tmp/foo$ # So is this: ~/tmp/foo$ git update-ref refs/yak/foobranch $(git rev-parse footag) ~/tmp/foo$ ~/tmp/foo$ # Before packing, all tags are available in peel versions: ~/tmp/foo$ git show-ref -d 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/foo/bar 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/foo/bar^{} d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{} d9cdc84dd156ff83799f5226794711fbb2c8273a refs/yak/foobranch 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/yak/foobranch^{} ~/tmp/foo$ ~/tmp/foo$ git pack-refs --all ~/tmp/foo$ ~/tmp/foo$ # After packing, tags outside of refs/tags are not peeled any more: ~/tmp/foo$ git show-ref -d 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/foo/bar d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{} d9cdc84dd156ff83799f5226794711fbb2c8273a refs/yak/foobranch ~/tmp/foo$ ~/tmp/foo$ # Peeling the tags via "tag^0" works even after packing: ~/tmp/foo$ git rev-parse refs/yak/foobranch^0 7e80ddd68f0225a0ea221f7cddbacf050be5a265 ~/tmp/foo$ ~/tmp/foo$ # Here is another way to create a tag outside of refs/tags: ~/tmp/foo$ cd .. ~/tmp$ git clone foo foo-clone Cloning into 'foo-clone'... done. ~/tmp$ cd foo-clone ~/tmp/foo-clone$ git config --add remote.origin.fetch '+refs/tags/*:refs/remotes/origin/tags/*' ~/tmp/foo-clone$ git fetch From /home/mhagger/tmp/foo * [new tag] footag -> origin/tags/footag ~/tmp/foo-clone$ ~/tmp/foo-clone$ # Again, the tag outside of refs/tags are not peeled correctly after packing: ~/tmp/foo-clone$ git pack-refs --all ~/tmp/foo-clone$ git show-ref -d 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/heads/master 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/origin/HEAD 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/remotes/origin/master d9cdc84dd156ff83799f5226794711fbb2c8273a refs/remotes/origin/tags/footag d9cdc84dd156ff83799f5226794711fbb2c8273a refs/tags/footag 7e80ddd68f0225a0ea221f7cddbacf050be5a265 refs/tags/footag^{} Michael [1] I am trying to fix the problem that peeled refs are lost whenever a packed reference is deleted. -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty @ 2013-03-13 15:34 ` Michael Haggerty 2013-03-13 17:29 ` Junio C Hamano 1 sibling, 0 replies; 17+ messages in thread From: Michael Haggerty @ 2013-03-13 15:34 UTC (permalink / raw) To: git discussion list; +Cc: Junio C Hamano On 03/13/2013 03:59 PM, Michael Haggerty wrote: > I have been working on the pack-refs code [1] and noticed what looks > like a problem with the handling of peeled refs in the packed-refs file > and in the reference cache. In particular, the peeled versions of tags > outside of refs/tags are *not* stored in packed-refs, but after the > packed-refs file is read it is assumed that such tags cannot be peeled. > > It is clear that annotated tags want to live under refs/tags, but there > are some ways to create them in other places (see below). It is not > clear to me whether the prohibition of tags outside of refs/tags should > be made more airtight or whether the peeling of tags outside of > refs/tags should be fixed. > > Example: > [...] I should have mentioned that I already understand the programmatic *cause* of the behavior that I described in my last email: * in pack-refs.c:handle_one_ref(), tags that are not in refs/tags are explicitly excluded from being peeled. * in refs.c:read_packed_refs(), if the packed-refs file starts with "# pack-refs with: peeled " then the REF_KNOWS_PEELED bit is set on *every* reference read from the file into the packed refs cache, whether or not it is under refs/tags. * in refs.c:peel_ref(), if a refs cache entry has its REF_KNOWS_PEELED bit set but its peeled field is empty, then it is assumed that the reference is unpeelable. What I am *not* clear about is which of these steps is incorrect, and also whether this problem will have any significant ill effects. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty 2013-03-13 15:34 ` Michael Haggerty @ 2013-03-13 17:29 ` Junio C Hamano 2013-03-13 21:58 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Junio C Hamano @ 2013-03-13 17:29 UTC (permalink / raw) To: Michael Haggerty; +Cc: git discussion list Michael Haggerty <mhagger@alum.mit.edu> writes: > It is not > clear to me whether the prohibition of tags outside of refs/tags should > be made more airtight or whether the peeling of tags outside of > refs/tags should be fixed. Retroactively forbidding presense/creation of tags outside the designated refs/tags hierarchy will not fly. I think we should peel them when we are reading from "# pack-refs with: peeled" version. Theoretically, we could introduce "# pack-refs with: fully-peeled" that records peeled versions for _all_ annotated tags even in unusual places, but that would be introducing problems to existing versions of the software to cater to use cases that is not blessed officially, so I doubt it has much value. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-13 17:29 ` Junio C Hamano @ 2013-03-13 21:58 ` Jeff King 2013-03-14 4:41 ` Michael Haggerty 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2013-03-13 21:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Michael Haggerty, git discussion list On Wed, Mar 13, 2013 at 10:29:54AM -0700, Junio C Hamano wrote: > Michael Haggerty <mhagger@alum.mit.edu> writes: > > > It is not > > clear to me whether the prohibition of tags outside of refs/tags should > > be made more airtight or whether the peeling of tags outside of > > refs/tags should be fixed. > > Retroactively forbidding presense/creation of tags outside the > designated refs/tags hierarchy will not fly. I think we should peel > them when we are reading from "# pack-refs with: peeled" version. > > Theoretically, we could introduce "# pack-refs with: fully-peeled" > that records peeled versions for _all_ annotated tags even in > unusual places, but that would be introducing problems to existing > versions of the software to cater to use cases that is not blessed > officially, so I doubt it has much value. I agree. The REF_KNOWS_PEELED thing is an optimization, so it's OK to simply omit the optimization in the corner case, since we don't expect it to happen often. So what happens now is a bug, but it is a bug in the reader that mis-applies the optimization, and we can just fix the reader. I do not think there is any point in peeling while we are reading the pack-refs file; it is no more expensive to do so later, and in most cases we will not even bother peeling. We should simply omit the REF_KNOWS_PEELED bit so that later calls to peel_ref do not follow the optimization code path. Something like this: diff --git a/refs.c b/refs.c index 175b9fc..ee498c8 100644 --- a/refs.c +++ b/refs.c @@ -824,7 +824,10 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) refname = parse_ref_line(refline, sha1); if (refname) { - last = create_ref_entry(refname, sha1, flag, 1); + int this_flag = flag; + if (prefixcmp(refname, "refs/tags/")) + this_flag &= ~REF_KNOWS_PEELED; + last = create_ref_entry(refname, sha1, this_flag, 1); add_ref(dir, last); continue; } I think with this patch, though, that upload-pack would end up having to read the object type of everything under refs/heads to decide whether it needs to be peeled (and in most cases, it does not, making the initial ref advertisement potentially much more expensive). How do we want to handle that? Should we teach upload-pack not to bother advertising peels outside of refs/tags? That would break people who expect tag auto-following to work for refs outside of refs/tags, but I am not sure that is a sane thing to expect. -Peff ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-13 21:58 ` Jeff King @ 2013-03-14 4:41 ` Michael Haggerty 2013-03-14 5:24 ` Jeff King 0 siblings, 1 reply; 17+ messages in thread From: Michael Haggerty @ 2013-03-14 4:41 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/13/2013 10:58 PM, Jeff King wrote: > On Wed, Mar 13, 2013 at 10:29:54AM -0700, Junio C Hamano wrote: > >> Michael Haggerty <mhagger@alum.mit.edu> writes: >> >>> It is not >>> clear to me whether the prohibition of tags outside of refs/tags should >>> be made more airtight or whether the peeling of tags outside of >>> refs/tags should be fixed. >> >> Retroactively forbidding presense/creation of tags outside the >> designated refs/tags hierarchy will not fly. I think we should peel >> them when we are reading from "# pack-refs with: peeled" version. >> >> Theoretically, we could introduce "# pack-refs with: fully-peeled" >> that records peeled versions for _all_ annotated tags even in >> unusual places, but that would be introducing problems to existing >> versions of the software to cater to use cases that is not blessed >> officially, so I doubt it has much value. I think that instead of changing "peeled" to "fully-peeled", it would be better to add "fully-peeled" as an additional keyword, like # pack-refs with: peeled fully-peeled Old readers would still see the "peeled" keyword and ignore the fully-peeled keyword, and would be able to read the file correctly. See below for more discussion. > I agree. The REF_KNOWS_PEELED thing is an optimization, so it's OK to > simply omit the optimization in the corner case, since we don't expect > it to happen often. So what happens now is a bug, but it is a bug in the > reader that mis-applies the optimization, and we can just fix the > reader. > > I do not think there is any point in peeling while we are reading the > pack-refs file; it is no more expensive to do so later, and in most > cases we will not even bother peeling. We should simply omit the > REF_KNOWS_PEELED bit so that later calls to peel_ref do not follow the > optimization code path. Something like this: > > diff --git a/refs.c b/refs.c > index 175b9fc..ee498c8 100644 > --- a/refs.c > +++ b/refs.c > @@ -824,7 +824,10 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) > > refname = parse_ref_line(refline, sha1); > if (refname) { > - last = create_ref_entry(refname, sha1, flag, 1); > + int this_flag = flag; > + if (prefixcmp(refname, "refs/tags/")) > + this_flag &= ~REF_KNOWS_PEELED; > + last = create_ref_entry(refname, sha1, this_flag, 1); > add_ref(dir, last); > continue; > } It would also be possible to set the REF_KNOWS_PEELED for any entries for which a peeled reference happens to be present in the packed-refs file, though the code would never be triggered if the current writer is not changed. > I think with this patch, though, that upload-pack would end up having to > read the object type of everything under refs/heads to decide whether it > needs to be peeled (and in most cases, it does not, making the initial > ref advertisement potentially much more expensive). How do we want to > handle that? Should we teach upload-pack not to bother advertising peels > outside of refs/tags? That would break people who expect tag > auto-following to work for refs outside of refs/tags, but I am not sure > that is a sane thing to expect. Here is analysis of our options as I see them: 1. Accept that tags outside of refs/tags are not reliably advertised in their peeled form. Document this deficiency and either: a. Don't even bother trying to peel refs outside of refs/tags (the status quo). b. Change the pack-refs writer to write all peeled refs, but leave the reader unchanged. This is a low-risk option that would cause old and new clients to do the right thing when reading a full packed-refs file, but an old or new client reading a non-full packed-refs file would not realize that it is non-full and would fail to advertise all peeled refs. Minor disadvantage: pack-refs becomes slower. 2. Insist that tags outside of refs/tags are reliably advertised. I see three ways to do it: a. Without changing the packed-refs contents. This would require upload-pack to read *all* references outside of refs/tags. (This is what Peff's patch does.) b. Write all peeled refs to packed-refs without changing the packed-refs header. This would hardly help, as upload-pack would still have to read all non-tag references outside of refs/tags to be sure that none are tags. c. Add a new keyword to the top of the packed-refs file as described above. Then * Old writer, new reader: the reader would know that some peeled refs might be missing. upload-pack would have to resolve refs outside of refs/tags, but could optionally write a new-format packed-refs file to avoid repeating the work. * New writer, new reader: the reader would know that all refs are peeled properly and would not have to read any objects. * Old writer, old reader: status quo; peeled refs are advertised incompletely. * New writer, old reader: This is the interesting case. The current code in Git would read the header and see "peeled" but ignore "fully-peeled". But the line-reading code would nevertheless happily read and record peeled references no matter what namespace they live in. It would also advertise them correctly. One would have to check historical versions and other clients to see whether they would also behave well. d. Add some new notation, like a "^" on a line by itself, to mean "I tried to peel this reference but it was not an annotated tag". Old readers ignore such lines but new readers could take it as an indication to set the REF_KNOWS_PEELED bit for that entry. (It is not clear to me whether it would be permissible to make a change like this without changing the header line.) I think the best option would be 1b. Even though there would never be a guarantee that all peeled references are recorded and advertised correctly, the behavior would asymptotically approach correctness as old Git versions are retired, and the situations where it fails are probably rare and no worse than the status quo. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 4:41 ` Michael Haggerty @ 2013-03-14 5:24 ` Jeff King 2013-03-14 5:32 ` Jeff King 2013-03-14 11:28 ` Michael Haggerty 0 siblings, 2 replies; 17+ messages in thread From: Jeff King @ 2013-03-14 5:24 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list On Thu, Mar 14, 2013 at 05:41:58AM +0100, Michael Haggerty wrote: > Here is analysis of our options as I see them: > > 1. Accept that tags outside of refs/tags are not reliably advertised in > their peeled form. Document this deficiency and either: > > a. Don't even bother trying to peel refs outside of refs/tags (the > status quo). When you say "not reliably advertised" you mean from upload-pack, right? What about other callers? From my reading of peel_ref, anybody who calls it to get a packed ref may actually get a wrong answer. So this is not just about tag auto-following over fetch, but about other places, too (however, a quick grep does not make it look like this other places are all that commonly used). Another fun fact: upload-pack did not use peel_ref until recently (435c833, in v1.8.1). So while it is tempting to say "well, this was always broken, and nobody cared", it was not really; it is a fairly recent regression in 435c833. > b. Change the pack-refs writer to write all peeled refs, but leave > the reader unchanged. This is a low-risk option that would cause > old and new clients to do the right thing when reading a full > packed-refs file, but an old or new client reading a non-full > packed-refs file would not realize that it is non-full and would > fail to advertise all peeled refs. Minor disadvantage: pack-refs > becomes slower. This seems sane. The missing thing is a flag to tell the reader that it was written by a newer version; I see you dealt with that case below. I don't think pack-refs being a little bit slower matters. Checking the types to peel is not even that much work; it's just that it adds up when you do it for every no-op fetch's ref advertisement. But we can assume that writing happens a lot less than reading; that is the point of storing the peeled information in the first place. If that assumption is not correct in some scenario, then those people should probably not be packing their refs at all, so I think we can discount them from this discussion. > 2. Insist that tags outside of refs/tags are reliably advertised. I > see three ways to do it: > > a. Without changing the packed-refs contents. This would require > upload-pack to read *all* references outside of refs/tags. (This > is what Peff's patch does.) FWIW, this is what upload-pack used to do before I switched it to use peel_ref. The savings are measurable, though they are not ground-breaking. Still, I think your "fully-packed" proposal above lets us keep the improvement without too much effort. > b. Write all peeled refs to packed-refs without changing the > packed-refs header. This would hardly help, as upload-pack > would still have to read all non-tag references outside of > refs/tags to be sure that none are tags. Right, this seems silly; the new reader does extra work for compatibility with an older writer, but the normal case is to use the same version. The obvious optimization is to add a flag that says "hey, I was written by a new writer". And that is your "fully-packed" proposal, covered below. > c. Add a new keyword to the top of the packed-refs file as > described above. Then > > * Old writer, new reader: the reader would know that some > peeled refs might be missing. upload-pack would have to > resolve refs outside of refs/tags, but could optionally write > a new-format packed-refs file to avoid repeating the work. I think that is OK. For the most part, this is a temporary situation that happens when you are moving from an older to a newer version of git. If you are switching back and forth between versions, then we are correct, but you don't get the benefit of the micro-optimization, which seems fair. > * New writer, new reader: the reader would know that all refs > are peeled properly and would not have to read any objects. This is the common case I think we should be optimizing for. And obviously the outcome here is good. It's also fine without adding the "fully-packed" flag, though. > * Old writer, old reader: status quo; peeled refs are advertised > incompletely. Right. We can't fix those without a time machine, though. > * New writer, old reader: This is the interesting case. The > current code in Git would read the header and see "peeled" but > ignore "fully-peeled". But the line-reading code would > nevertheless happily read and record peeled references no > matter what namespace they live in. It would also advertise > them correctly. One would have to check historical versions > and other clients to see whether they would also behave well. Right. So we have pretty sane backwards-compatible behavior. I think if other implementations are not happy seeing a new tag, they are wrong. The whole point of the "#"-tags is for future expansion. Looking over libgit2, it seems to ignore the "#"-line completely, so I think it would behave OK (and it should be taught about fully-peeled, too, but that is not our immediate problem). > d. Add some new notation, like a "^" on a line by itself, to mean > "I tried to peel this reference but it was not an annotated tag". > Old readers ignore such lines but new readers could take it > as an indication to set the REF_KNOWS_PEELED bit for that entry. > (It is not clear to me whether it would be permissible to make a > change like this without changing the header line.) I don't see the point. The writer wants to provide REF_KNOWS_PEELED for every entry in its list so that the reader (which is run more often) does not have to put forth any effort. So this accomplishes the same thing as a "anything without a peeled ref did not need peeling" flag, but takes more space. > I think the best option would be 1b. Even though there would never be a > guarantee that all peeled references are recorded and advertised > correctly, the behavior would asymptotically approach correctness as old > Git versions are retired, and the situations where it fails are probably > rare and no worse than the status quo. Thanks for laying out the options. I think 1b or 2c are the only sane paths forward. With either option, a newer writer produces something that all implementations, old and new, should get right, and that is primarily what we care about. So the only question is how much work we want to put into making sure the new reader handles the old writer correctly. Doing 2c is obviously more rigorous, and it is not that much work to add the fully-packed flag, but I kind of wonder if anybody even cares. We can just say "it's a bug fix; run `git pack-refs` again if you care" and call it a day (i.e., 1b). I could go either way. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 5:24 ` Jeff King @ 2013-03-14 5:32 ` Jeff King 2013-03-14 15:14 ` Junio C Hamano 2013-03-14 11:28 ` Michael Haggerty 1 sibling, 1 reply; 17+ messages in thread From: Jeff King @ 2013-03-14 5:32 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list On Thu, Mar 14, 2013 at 01:24:48AM -0400, Jeff King wrote: > So the only question is how much work we want to put into making sure > the new reader handles the old writer correctly. Doing 2c is obviously > more rigorous, and it is not that much work to add the fully-packed > flag, but I kind of wonder if anybody even cares. We can just say "it's > a bug fix; run `git pack-refs` again if you care" and call it a day > (i.e., 1b). Urgh, for some reason I kept writing "fully-packed" but obviously I meant "fully-peeled". Hopefully you figured it out. Just as a quick sketch of how much work is in involved in 2c, I think the complete solution would look like this (but note I haven't tested this at all): diff --git a/pack-refs.c b/pack-refs.c index f09a054..261a6a6 100644 --- a/pack-refs.c +++ b/pack-refs.c @@ -27,6 +27,7 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, int flags, void *cb_data) { struct pack_refs_cb_data *cb = cb_data; + struct object *o; int is_tag_ref; /* Do not pack the symbolic refs */ @@ -39,14 +40,12 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, return 0; fprintf(cb->refs_file, "%s %s\n", sha1_to_hex(sha1), path); - if (is_tag_ref) { - struct object *o = parse_object(sha1); - if (o->type == OBJ_TAG) { - o = deref_tag(o, path, 0); - if (o) - fprintf(cb->refs_file, "^%s\n", - sha1_to_hex(o->sha1)); - } + o = parse_object(sha1); + if (o->type == OBJ_TAG) { + o = deref_tag(o, path, 0); + if (o) + fprintf(cb->refs_file, "^%s\n", + sha1_to_hex(o->sha1)); } if ((cb->flags & PACK_REFS_PRUNE) && !do_not_prune(flags)) { @@ -128,7 +127,7 @@ int pack_refs(unsigned int flags) die_errno("unable to create ref-pack file structure"); /* perhaps other traits later as well */ - fprintf(cbdata.refs_file, "# pack-refs with: peeled \n"); + fprintf(cbdata.refs_file, "# pack-refs with: peeled fully-peeled \n"); for_each_ref(handle_one_ref, &cbdata); if (ferror(cbdata.refs_file)) diff --git a/refs.c b/refs.c index 175b9fc..770abf4 100644 --- a/refs.c +++ b/refs.c @@ -808,6 +808,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) struct ref_entry *last = NULL; char refline[PATH_MAX]; int flag = REF_ISPACKED; + int fully_peeled = 0; while (fgets(refline, sizeof(refline), f)) { unsigned char sha1[20]; @@ -818,13 +819,18 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) const char *traits = refline + sizeof(header) - 1; if (strstr(traits, " peeled ")) flag |= REF_KNOWS_PEELED; + if (strstr(traits, " fully-peeled ")) + fully_peeled = 1; /* perhaps other traits later as well */ continue; } refname = parse_ref_line(refline, sha1); if (refname) { - last = create_ref_entry(refname, sha1, flag, 1); + int this_flag = flag; + if (!fully_peeled && prefixcmp(refname, "refs/tags/")) + this_flag &= ~REF_KNOWS_PEELED; + last = create_ref_entry(refname, sha1, this_flag, 1); add_ref(dir, last); continue; } So it's really not that much code. The bigger question is whether we want to have to carry the "fully-peeled" tag forever, and how other implementations would treat it. -Peff ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 5:32 ` Jeff King @ 2013-03-14 15:14 ` Junio C Hamano 0 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2013-03-14 15:14 UTC (permalink / raw) To: Jeff King; +Cc: Michael Haggerty, git discussion list Jeff King <peff@peff.net> writes: > On Thu, Mar 14, 2013 at 01:24:48AM -0400, Jeff King wrote: > >> So the only question is how much work we want to put into making sure >> the new reader handles the old writer correctly. Doing 2c is obviously >> more rigorous, and it is not that much work to add the fully-packed >> flag, but I kind of wonder if anybody even cares. We can just say "it's >> a bug fix; run `git pack-refs` again if you care" and call it a day >> (i.e., 1b). > > Urgh, for some reason I kept writing "fully-packed" but obviously I > meant "fully-peeled". Hopefully you figured it out. > > Just as a quick sketch of how much work is in involved in 2c, I think > the complete solution would look like this (but note I haven't tested > this at all): Looks reasonable from a cursory look, and I think we cannot avoid going with the route to add "fully-peeled" token, because we do have to express "we know this ref outside refs/tags/ area is not an annotated tag" by marking them as REF_KNOWS_PEELED for the peel_ref() optimization to work. > diff --git a/pack-refs.c b/pack-refs.c > index f09a054..261a6a6 100644 > --- a/pack-refs.c > +++ b/pack-refs.c > @@ -27,6 +27,7 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, > int flags, void *cb_data) > { > struct pack_refs_cb_data *cb = cb_data; > + struct object *o; > int is_tag_ref; > > /* Do not pack the symbolic refs */ > @@ -39,14 +40,12 @@ static int handle_one_ref(const char *path, const unsigned char *sha1, > return 0; > > fprintf(cb->refs_file, "%s %s\n", sha1_to_hex(sha1), path); > - if (is_tag_ref) { > - struct object *o = parse_object(sha1); > - if (o->type == OBJ_TAG) { > - o = deref_tag(o, path, 0); > - if (o) > - fprintf(cb->refs_file, "^%s\n", > - sha1_to_hex(o->sha1)); > - } > + o = parse_object(sha1); > + if (o->type == OBJ_TAG) { > + o = deref_tag(o, path, 0); > + if (o) > + fprintf(cb->refs_file, "^%s\n", > + sha1_to_hex(o->sha1)); > } > > if ((cb->flags & PACK_REFS_PRUNE) && !do_not_prune(flags)) { > @@ -128,7 +127,7 @@ int pack_refs(unsigned int flags) > die_errno("unable to create ref-pack file structure"); > > /* perhaps other traits later as well */ > - fprintf(cbdata.refs_file, "# pack-refs with: peeled \n"); > + fprintf(cbdata.refs_file, "# pack-refs with: peeled fully-peeled \n"); > > for_each_ref(handle_one_ref, &cbdata); > if (ferror(cbdata.refs_file)) > diff --git a/refs.c b/refs.c > index 175b9fc..770abf4 100644 > --- a/refs.c > +++ b/refs.c > @@ -808,6 +808,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) > struct ref_entry *last = NULL; > char refline[PATH_MAX]; > int flag = REF_ISPACKED; > + int fully_peeled = 0; > > while (fgets(refline, sizeof(refline), f)) { > unsigned char sha1[20]; > @@ -818,13 +819,18 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) > const char *traits = refline + sizeof(header) - 1; > if (strstr(traits, " peeled ")) > flag |= REF_KNOWS_PEELED; > + if (strstr(traits, " fully-peeled ")) > + fully_peeled = 1; > /* perhaps other traits later as well */ > continue; > } > > refname = parse_ref_line(refline, sha1); > if (refname) { > - last = create_ref_entry(refname, sha1, flag, 1); > + int this_flag = flag; > + if (!fully_peeled && prefixcmp(refname, "refs/tags/")) > + this_flag &= ~REF_KNOWS_PEELED; > + last = create_ref_entry(refname, sha1, this_flag, 1); > add_ref(dir, last); > continue; > } > > So it's really not that much code. The bigger question is whether we > want to have to carry the "fully-peeled" tag forever, and how other > implementations would treat it. > > -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 5:24 ` Jeff King 2013-03-14 5:32 ` Jeff King @ 2013-03-14 11:28 ` Michael Haggerty 2013-03-14 13:40 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Michael Haggerty @ 2013-03-14 11:28 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/14/2013 06:24 AM, Jeff King wrote: > On Thu, Mar 14, 2013 at 05:41:58AM +0100, Michael Haggerty wrote: > >> Here is analysis of our options as I see them: >> >> 1. Accept that tags outside of refs/tags are not reliably advertised in >> their peeled form. Document this deficiency and either: >> >> a. Don't even bother trying to peel refs outside of refs/tags (the >> status quo). > > When you say "not reliably advertised" you mean from upload-pack, right? > What about other callers? From my reading of peel_ref, anybody who calls > it to get a packed ref may actually get a wrong answer. So this is not > just about tag auto-following over fetch, but about other places, too > (however, a quick grep does not make it look like this other places are > all that commonly used). Yes, this is a good point. I didn't audit all of the callers to see whether any of them need 100% reliability, but I guess I should: upload-pack.c:763: in send_ref(): This is the caller that we have been talking about. builtin/pack-objects.c:559: in mark_tagged(): This function is only called for references under refs/tags. builtin/pack-objects.c:2035: in add_ref_tag(): peel_ref() is called only for refnames under refs/tags. builtin/describe.c:147: in get_name(): peel_ref() is called only for refnames under refs/tags. builtin/show-ref.c:81: in show_ref(): this is broken too, as I showed in my original email. This breakage was also introduced in 435c833. Perhaps if peel_ref() *were* 100% reliable, we might be able to use it to avoid object lookups in some other places. > Another fun fact: upload-pack did not use peel_ref until recently > (435c833, in v1.8.1). So while it is tempting to say "well, this was > always broken, and nobody cared", it was not really; it is a fairly > recent regression in 435c833. I didn't realize that; thanks for pointing it out. Although peel_ref() itself has been broken since it was introduced, at least in the sense that it gives wrong answers for tags outside of refs/tags, prior to 435c833 it appears to only have been used for refs/tags. >> I think the best option would be 1b. Even though there would never be a >> guarantee that all peeled references are recorded and advertised >> correctly, the behavior would asymptotically approach correctness as old >> Git versions are retired, and the situations where it fails are probably >> rare and no worse than the status quo. > > Thanks for laying out the options. I think 1b or 2c are the only sane > paths forward. With either option, a newer writer produces something > that all implementations, old and new, should get right, and that is > primarily what we care about. > > So the only question is how much work we want to put into making sure > the new reader handles the old writer correctly. Doing 2c is obviously > more rigorous, and it is not that much work to add the fully-packed > flag, but I kind of wonder if anybody even cares. We can just say "it's > a bug fix; run `git pack-refs` again if you care" and call it a day > (i.e., 1b). > > I could go either way. On 03/14/2013 06:32 AM, Jeff King wrote: > [...] > So it's really not that much code. The bigger question is whether we > want to have to carry the "fully-peeled" tag forever, and how other > implementations would treat it. I agree that 2c is also an attractive option. Its biggest advantage is that it make peel_ref() reliable and therefore potentially usable for other purposes. And probably the effort of carrying forward the "fully-peeled" tag is no more work than the alternative of carrying forward the knowledge that peel_ref() is unreliable. Your patch looks about right aside from its lack of comments :-). I was going to implement approximately the same thing in a patch series that I am working on, but if you prefer then go ahead and submit your patch and I will rebase my branch on top of it. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 11:28 ` Michael Haggerty @ 2013-03-14 13:40 ` Jeff King 2013-03-15 5:12 ` Michael Haggerty 2013-03-16 8:48 ` Michael Haggerty 0 siblings, 2 replies; 17+ messages in thread From: Jeff King @ 2013-03-14 13:40 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote: > Perhaps if peel_ref() *were* 100% reliable, we might be able to use it > to avoid object lookups in some other places. In theory, some of the many uses of deref_tag could be adopted. However, we do not always have the refname handy at that point (and I believe peel_ref's optimization only kicks in during for_each_ref traversals anyway). It may still be a win to check the packed-refs file before peeling a random sha1, as looking up there should be cheaper than actually loading the object. But right now, the way the optimization is used is always O(1) to just check the last ref loaded. With your recent ref refactoring, I think we should be able to do lookups in O(lg n). > > Another fun fact: upload-pack did not use peel_ref until recently > > (435c833, in v1.8.1). So while it is tempting to say "well, this was > > always broken, and nobody cared", it was not really; it is a fairly > > recent regression in 435c833. > > I didn't realize that; thanks for pointing it out. Although peel_ref() > itself has been broken since it was introduced, at least in the sense > that it gives wrong answers for tags outside of refs/tags, prior to > 435c833 it appears to only have been used for refs/tags. Hmph. I coincidentally ran across another problem with 435c833 today. Try this: git init repo && cd repo && git commit --allow-empty -m one && git checkout -b other && git commit --allow-empty -m two && git checkout master && git commit --allow-empty -m three && git pack-refs --all && git.compile clone --no-local --depth=1 --no-single-branch . foo I get: Cloning into 'foo'... fatal: (null) is unknown object remote: Total 0 (delta 0), reused 0 (delta 0) fatal: recursion detected in die handler remote: aborting due to possible repository corruption on the remote side. fatal: error in sideband demultiplexer This is not due to the same problem you are describing with peel_refs, but simply due to the fact that we do not load and parse objects anymore (which is the point of the optimization). The client feeds these back to us in the "want" list, and we then feed these objects to the revision walker, which expects them to be parsed and have their "type" field actually filled in. We never noticed before because: 1. It only happens with --depth, because otherwise we do not do the revision walk in-process. 2. Modern git, when given --depth, will default to --single-branch, and so ask only about HEAD, which we do parse. However, older versions of git (pre v1.7.10) will ask for much more, and trigger the bug. I haven't tried yet, but I suspect you may also be able to trigger it by asking for "clone --depth=1 -b other". That's the overtly visible symptom. We also check the type in ok_to_give_up, so I suspect that can produce subtly wrong results in some situations. The solution is to actually parse the objects in question, but I need to figure out when is the optimal time. Obviously any time we read a "want" line would be enough, but we may be able to get by with less. OTOH, it probably doesn't matter that much; the point of the optimization was to avoid touching objects for the ref advertisement. Once we have a "want", the client really is going to fetch objects, and accessing them will probably be lost in the noise. But that's somewhat off-topic for this discussion. I'll look into it further and try to make a patch later today or tomorrow. > Your patch looks about right aside from its lack of comments :-). I was > going to implement approximately the same thing in a patch series that I > am working on, but if you prefer then go ahead and submit your patch and > I will rebase my branch on top of it. I just meant it as a quick sketch, since you said you were working in the area. I'm not sure what you are working on. If we don't consider it an urgent bugfix, I'm just as happy for you to incorporate the idea into what you are doing. But if we want to do it as a maint-track fix, I can clean up my patch. Junio? -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 13:40 ` Jeff King @ 2013-03-15 5:12 ` Michael Haggerty 2013-03-15 16:28 ` Junio C Hamano 2013-03-16 8:48 ` Michael Haggerty 1 sibling, 1 reply; 17+ messages in thread From: Michael Haggerty @ 2013-03-15 5:12 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/14/2013 02:40 PM, Jeff King wrote: > On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote: > >> Perhaps if peel_ref() *were* 100% reliable, we might be able to use it >> to avoid object lookups in some other places. > > In theory, some of the many uses of deref_tag could be adopted. However, > we do not always have the refname handy at that point (and I believe > peel_ref's optimization only kicks in during for_each_ref traversals > anyway). > > It may still be a win to check the packed-refs file before peeling a > random sha1, as looking up there should be cheaper than actually loading > the object. But right now, the way the optimization is used is always > O(1) to just check the last ref loaded. With your recent ref > refactoring, I think we should be able to do lookups in O(lg n). There are more conceptual problems in the handling of peeled references and current_ref. I want to document them here for future reference and to get other people thinking about whether other parts of the code might be making wrong assumptions about this stuff. What is stored in ref_value.peeled? Is it the peeled version of ref_value.sha1, or is it the peeled version of the associated refname? Because they are not necessarily the same thing: an entry in the packed ref_cache *might* be overridden by a loose reference with the same refname but a different SHA1 which peels to a different value. Obviously we cannot always guarantee that ref_value.peeled is the peeled version of the refname, because it is read from a packed-refs file that might be old. So the only sane policy is that ref_value.peeled should be the peeled version of ref_value.sha1 regardless of whether the refname now points elsewhere. It follows that the only time we can be sure that ref_value.peeled is the correct peeled version of refname is if we know that there is no loose reference overriding it. This leads to some subtleties when reading and writing ref_value.peeled. When reading ref_value.peeled: When we iterate over references using do_for_each_ref(), then we only present a packed ref if there is no loose ref with the same refname. So in that case it is OK to point current_ref at the packed ref entry, and if somebody calls peel_ref() for the current reference then (modulo race conditions?) current_ref->u.value.peeled is indeed the correct peeled value for that refname. But if we iterate over *only* the packed refs using do_for_each_ref_in_dir() (as, for example, in repack_without_ref()), then the iteration doesn't even look at the loose refs, so we don't know whether the packed ref_entry being iterated over is authoritative. But we nevertheless set current_ref to that entry. So if somebody would call peel_ref() on the current refname during such an iteration, they could get the peeled version of the packed ref rather than the correct peeled version for the refname. Currently, nobody calls peel_ref() during such an iteration, so I don't think it causes any bugs. But it is an unmarked landmine. When setting ref_value.peeled: When determining the value to write to ref_value.peeled (which I believe only happens in pack-refs), we should record the peeled version of ref_entry.sha1, and *not* the peeled version of refname. This is indeed what pack-refs does. But if somebody were to get the clever and reasonable-sounding idea of using peel_ref() in the implementation of pack-refs.c:handle_one_ref(), then it would be easy to accidentally get a peeled value for the current refname rather than for the SHA1 being recorded in the file. I am trying to change repack_without_refs() to write the peeled refs to the packed-refs file (currently the peeled refs are lost if a packed ref is deleted). For this I would like to reuse the pre-peeled values from the old version of the packed-refs file to avoid having to resolve them all again. So all of the issues mentioned above are coming together for me. I think my solution will be to add a static function in refs.c that passes pointers to the ref_entries to its callback, so that the function has direct access to the peeled member rather than having to access it via the information-losing peel_ref(). Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-15 5:12 ` Michael Haggerty @ 2013-03-15 16:28 ` Junio C Hamano 0 siblings, 0 replies; 17+ messages in thread From: Junio C Hamano @ 2013-03-15 16:28 UTC (permalink / raw) To: Michael Haggerty; +Cc: Jeff King, git discussion list Michael Haggerty <mhagger@alum.mit.edu> writes: > What is stored in ref_value.peeled? Is it the peeled version of > ref_value.sha1, or is it the peeled version of the associated refname? > Because they are not necessarily the same thing: an entry in the packed > ref_cache *might* be overridden by a loose reference with the same > refname but a different SHA1 which peels to a different value. The way it should work is that we look for loose ones first and then only if we do not find a loose one we use packed one, when asked for a ref by name. The .sha1 and .peeled fields for a single ref_entry struct must be consistent with each other, even though you might have got a ref_value by reading the packed-refs file and another ref_value by reading loose one and have both in core. When you have both packed and loose, the former should not be used at all, and it certainly should not be pointed by current_ref. Stepping back a bit, I suspect that it may be worth evaluating to see if it still makes sense to keep the current_ref optimization that was introduced in 0ae91be0e1fa (Optimize peel_ref for the current ref of a for_each_ref callback, 2008-02-24). Back then, it was fairly expensive to find a cached_ref entry by name, because it was implemented as two linked lists (one for loose, one for packed) and we would have had to traverse them to answer the question. Since then, e9c4c11165e4 (refs: Use binary search to lookup refs faster, 2011-09-29) introduced ref_array to make it more efficient to find a ref_entry by name, and your more recent series that includes bc5fd6d3c29f (refs.c: reorder definitions more logically, 2012-04-10) further touched that codepath for enumeration in a subtree. I am not sure if that last change helped or hurt the performance of a single ref lookup, but we can certainly say that the performance characteristics of read_ref_full() is very different in today's code and the current_ref optimization may no longer be of much value. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-14 13:40 ` Jeff King 2013-03-15 5:12 ` Michael Haggerty @ 2013-03-16 8:48 ` Michael Haggerty 2013-03-16 9:34 ` Jeff King 1 sibling, 1 reply; 17+ messages in thread From: Michael Haggerty @ 2013-03-16 8:48 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/14/2013 02:40 PM, Jeff King wrote: > Hmph. I coincidentally ran across another problem with 435c833 today. > Try this: > [...] > > But that's somewhat off-topic for this discussion. I'll look into it > further and try to make a patch later today or tomorrow. > > On Thu, Mar 14, 2013 at 12:28:53PM +0100, Michael Haggerty wrote: >> Your patch looks about right aside from its lack of comments :-). I was >> going to implement approximately the same thing in a patch series that I >> am working on, but if you prefer then go ahead and submit your patch and >> I will rebase my branch on top of it. > > I just meant it as a quick sketch, since you said you were working in > the area. I'm not sure what you are working on. If we don't consider it > an urgent bugfix, I'm just as happy for you to incorporate the idea into > what you are doing. > > But if we want to do it as a maint-track fix, I can clean up my patch. > Junio? My patch series is nearly done. I will need another day or two to review and make it submission-ready, but I wanted to give you an idea of what I'm up to and I could also use your feedback on some points. The (preliminary) patch series is available on GitHub: https://github.com/mhagger/git/commits/pack-refs-unification Highlights: * Move pack-refs code into refs.c * Implement fully-peeled packed-refs files, as discussed upthread: peel references outside of refs/tags, and keep rigorous track of which references have REF_KNOWS_PEELED. * Change how references are iterated over internally (without changing the exported API): * Provide direct access to the ref_entries * Don't set current_ref if not iterating over all refs * Change pack-refs to use the peeled information from ref_entries if it is available, rather than looking up the references again. * repack_without_ref() writes peeled refs to the packed-refs file. Use the old values if available; otherwise look them up. (The old version of repack_without_refs() wrote a packed-refs file without any peeled values even if they were present in the old packed-refs file.) Remove the deleted reference from the in-RAM packed ref cache in-place instead of discarding the whole cache. * Add some tests and lots of comments. Here are my questions for you: 1. There are multiple functions for peeling refs: peel_ref() (and peel_object(), which was extracted from it); peel_entry() (derived from pack-refs.c:pack_one_ref()). It would be nice to combine these into the One True Function. But given the problem that you mentioned above (which is rooted in parts of the code that I don't know) I don't know what that version should do. 2. repack_without_ref() now writes peeled refs, peeling them if necessary. It does this *without* referring to the loose refs to avoid having to load all of the loose references, which can be time-consuming. But this means that it might try to lookup SHA1s that are not the current value of the corresponding reference any more, and might even have been garbage collected. Is the code that I use to do this, in peel_entry(), safe to call for nonexistent SHA1s (I would like it to return 0 if the SHA1 is invalid)?: o = parse_object(entry->u.value.sha1); if (o->type == OBJ_TAG) { o = deref_tag(o, entry->name, 0); if (o) { hashcpy(entry->u.value.peeled, o->sha1); entry->flag |= REF_KNOWS_PEELED; return 1; } } return 0; I've tried to test this experimentally and it seems to work, but I would like to have some theoretical support :-) 3. This same change to repack_with_ref() means that it could become significantly slower to delete a packed ref if the packed-ref file is not fully-peeled. On the plus side, once this is done once, the packed-refs files will be rewritten in fully-peeled form. But if two versions of Git are being used in a repository, this cost could be incurred repeatedly. Does anybody object? 4. Should I change the locking policy as discussed in this thread?: http://article.gmane.org/gmane.comp.version-control.git/212131 Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-16 8:48 ` Michael Haggerty @ 2013-03-16 9:34 ` Jeff King 2013-03-16 13:38 ` Michael Haggerty 0 siblings, 1 reply; 17+ messages in thread From: Jeff King @ 2013-03-16 9:34 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote: > My patch series is nearly done. I will need another day or two to > review and make it submission-ready, but I wanted to give you an idea of > what I'm up to and I could also use your feedback on some points. I was just sending out my cleaned up patches to do do fully-peeled, too. I think the duplication is OK, though, as your topic would be for "master" and mine can go to "maint". > * Implement fully-peeled packed-refs files, as discussed upthread: peel > references outside of refs/tags, and keep rigorous track of > which references have REF_KNOWS_PEELED. Looks pretty similar to mine. We even added similar tests. I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel line" optimization. I didn't bother, because this will never be triggered by a git-written file (either we do not write the entry, or we set fully-peeled). I wonder if any other implementation does peel every ref, though. > * Change pack-refs to use the peeled information from ref_entries if it > is available, rather than looking up the references again. I don't know that it matters from a performance standpoint (we don't really care how long pack-refs takes, as long as it is within reason, because it is run as part of a gc). But it would mean that any errors in the file are propagated from one run of pack-refs to the next. I don't know if we want to spend the extra time to be more robust. > * repack_without_ref() writes peeled refs to the packed-refs file. > Use the old values if available; otherwise look them up. Whereas here we probably _do_ want the performance optimization, because we do care about the performance of a ref deletion. > 1. There are multiple functions for peeling refs: > peel_ref() (and peel_object(), which was extracted from it); > peel_entry() (derived from pack-refs.c:pack_one_ref()). It would be > nice to combine these into the One True Function. But given the > problem that you mentioned above (which is rooted in parts of the > code that I don't know) I don't know what that version should do. I'm not sure I understand the question. Just skimming your patches, it looks like peel_entry could just call peel_object? > 2. repack_without_ref() now writes peeled refs, peeling them if > necessary. It does this *without* referring to the loose refs > to avoid having to load all of the loose references, which can be > time-consuming. But this means that it might try to lookup SHA1s > that are not the current value of the corresponding reference any > more, and might even have been garbage collected. Yuck. I really wonder if repack_without_ref should literally just be writing out the exact same file, minus the lines for the deleted ref. Is there a reason we need to do any processing at all? I guess the answer is that you want to avoid re-parsing the current file, and just write it out from memory. But don't we need to refresh the memory cache from disk under the lock anyway? That was the pack-refs race that I fixed recently. > Is the code that > I use to do this, in peel_entry(), safe to call for nonexistent > SHA1s (I would like it to return 0 if the SHA1 is invalid)?: > > o = parse_object(entry->u.value.sha1); > if (o->type == OBJ_TAG) { > o = deref_tag(o, entry->name, 0); > if (o) { > hashcpy(entry->u.value.peeled, o->sha1); > entry->flag |= REF_KNOWS_PEELED; > return 1; > } > } > return 0; I think this approach is safe, as parse_object will silently return NULL for a missing object. You do need to check for "o != NULL" in your conditional, though. > 3. This same change to repack_with_ref() means that it could become > significantly slower to delete a packed ref if the packed-ref file > is not fully-peeled. On the plus side, once this is done once, the > packed-refs files will be rewritten in fully-peeled form. But if > two versions of Git are being used in a repository, this cost could > be incurred repeatedly. Does anybody object? I think it's OK in concept. But I still am wondering if it would be simpler still to just pass the file through while locked. > 4. Should I change the locking policy as discussed in this thread?: > > http://article.gmane.org/gmane.comp.version-control.git/212131 I think it's overall a sane direction. It does increase lock contention slightly when two processes are deleting at the same time, but it would fix the weird corner cases I described (mostly deleted refs reappearing due to races). And the lock contention is already there in a fully-packed repo anyway. I.e., right now we read the packed-refs file and lock it if our to-delete ref is in there; with the proposed change, we would lock before even reading it. So the increased contention is only when two deleters race each other, _and_ one of them is not deleting a packed ref. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-16 9:34 ` Jeff King @ 2013-03-16 13:38 ` Michael Haggerty 2013-03-18 3:17 ` Michael Haggerty 2013-03-22 17:42 ` Jeff King 0 siblings, 2 replies; 17+ messages in thread From: Michael Haggerty @ 2013-03-16 13:38 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/16/2013 10:34 AM, Jeff King wrote: > On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote: > >> My patch series is nearly done. I will need another day or two to >> review and make it submission-ready, but I wanted to give you an idea of >> what I'm up to and I could also use your feedback on some points. > > I was just sending out my cleaned up patches to do do fully-peeled, too. > I think the duplication is OK, though, as your topic would be for > "master" and mine can go to "maint". Yes, though I'll have to rebase mine on top of yours. >> * Implement fully-peeled packed-refs files, as discussed upthread: peel >> references outside of refs/tags, and keep rigorous track of >> which references have REF_KNOWS_PEELED. > > Looks pretty similar to mine. We even added similar tests. > > I notice that you do the "add REF_KNOWS_PEELED if we actually got a peel > line" optimization. I didn't bother, because this will never be > triggered by a git-written file (either we do not write the entry, or we > set fully-peeled). I wonder if any other implementation does peel every > ref, though. We read the peeled value for refs outside of refs/tags even if fully-peeled is not set, so it seemed somehow inconsistent not to set REF_KNOWS_PEELED. But I agree that Git itself should never write such entries, except of course for the interval between your first patch and your second patch ;-) >> * Change pack-refs to use the peeled information from ref_entries if it >> is available, rather than looking up the references again. > > I don't know that it matters from a performance standpoint (we don't > really care how long pack-refs takes, as long as it is within reason, > because it is run as part of a gc). But it would mean that any errors > in the file are propagated from one run of pack-refs to the next. I > don't know if we want to spend the extra time to be more robust. I thought about this argument too. Either way is OK with me. BTW would it make sense to build a consistency check of peeled references into fsck? >> * repack_without_ref() writes peeled refs to the packed-refs file. >> Use the old values if available; otherwise look them up. > > Whereas here we probably _do_ want the performance optimization, because > we do care about the performance of a ref deletion. Agreed. >> 1. There are multiple functions for peeling refs: >> peel_ref() (and peel_object(), which was extracted from it); >> peel_entry() (derived from pack-refs.c:pack_one_ref()). It would be >> nice to combine these into the One True Function. But given the >> problem that you mentioned above (which is rooted in parts of the >> code that I don't know) I don't know what that version should do. > > I'm not sure I understand the question. Just skimming your patches, it > looks like peel_entry could just call peel_object? I believe that the version in peel_object() is the one that you optimized in your now-famous 435c833 patches, which your earlier email said was connected to a null pointer dereference. I'm not at all familiar with the API being used there, so I don't know whether the two versions are interchangeable, or whether you need to fix your optimization, or whether your optimization will need to be reverted because of the problem you discovered. >> 2. repack_without_ref() now writes peeled refs, peeling them if >> necessary. It does this *without* referring to the loose refs >> to avoid having to load all of the loose references, which can be >> time-consuming. But this means that it might try to lookup SHA1s >> that are not the current value of the corresponding reference any >> more, and might even have been garbage collected. > > Yuck. I really wonder if repack_without_ref should literally just be > writing out the exact same file, minus the lines for the deleted ref. > Is there a reason we need to do any processing at all? I guess the > answer is that you want to avoid re-parsing the current file, and just > write it out from memory. But don't we need to refresh the memory cache > from disk under the lock anyway? That was the pack-refs race that I > fixed recently. It would certainly be thinkable to rewrite the file textually without parsing it again. But I did it this way for two reasons: 1. It would be better to have one packed-refs file parser and writer rather than two. (I'm working towards unifying the two writers, and will continue once I understand their differences.) 2. Given how peeled references were added, it seems dangerous to read, modify, and write data that might be in a future format that we don't entirely understand. For example, suppose a new feature is added to mark Junio's favorite tags: # pack-refs with: peeled fully-peeled junios-favorites \n ce432cac30f98b291be609a0fc974042a2156f55 refs/heads/master 83b3dd7151e7eb0e5ac62ee03aca7e6bccaa8d07 refs/tags/evil-dogs ^e1d04f8aad59397cbd55e72bf8a1bd75606f5350 7ed863a85a6ce2c4ac4476848310b8f917ab41f9 refs/tags/lolcats ^990f856a62a24bfd56bac1f5e4581381369e4ede ^^^junios-favorite b0517166ae2ad92f3b17638cbdee0f04b8170d99 refs/tags/nonsense ^4baff50551545e2b6825973ec37bcaf03edb95fe This would be backwards-compatible with the current code (granted, one would lose the favorite information if the file is rewritten with an older version of the code). But if we delete the lolcats tag textually, we would cause the favorite annotation to be moved to a different tag and thereby corrupt the data. >> Is the code that >> I use to do this, in peel_entry(), safe to call for nonexistent >> SHA1s (I would like it to return 0 if the SHA1 is invalid)?: >> >> o = parse_object(entry->u.value.sha1); >> if (o->type == OBJ_TAG) { >> o = deref_tag(o, entry->name, 0); >> if (o) { >> hashcpy(entry->u.value.peeled, o->sha1); >> entry->flag |= REF_KNOWS_PEELED; >> return 1; >> } >> } >> return 0; > > I think this approach is safe, as parse_object will silently return NULL > for a missing object. You do need to check for "o != NULL" in your > conditional, though. Thanks; will fix. >> 3. This same change to repack_with_ref() means that it could become >> significantly slower to delete a packed ref if the packed-ref file >> is not fully-peeled. On the plus side, once this is done once, the >> packed-refs files will be rewritten in fully-peeled form. But if >> two versions of Git are being used in a repository, this cost could >> be incurred repeatedly. Does anybody object? > > I think it's OK in concept. But I still am wondering if it would be > simpler still to just pass the file through while locked. See above. >> 4. Should I change the locking policy as discussed in this thread?: >> >> http://article.gmane.org/gmane.comp.version-control.git/212131 > > I think it's overall a sane direction. It does increase lock contention > slightly when two processes are deleting at the same time, but it would > fix the weird corner cases I described (mostly deleted refs reappearing > due to races). And the lock contention is already there in a > fully-packed repo anyway. I.e., right now we read the packed-refs file > and lock it if our to-delete ref is in there; with the proposed change, > we would lock before even reading it. So the increased contention is > only when two deleters race each other, _and_ one of them is not > deleting a packed ref. OK, I'll work on that too. Thanks for your feedback! Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-16 13:38 ` Michael Haggerty @ 2013-03-18 3:17 ` Michael Haggerty 2013-03-22 17:42 ` Jeff King 1 sibling, 0 replies; 17+ messages in thread From: Michael Haggerty @ 2013-03-18 3:17 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git discussion list On 03/16/2013 02:38 PM, Michael Haggerty wrote: > On 03/16/2013 10:34 AM, Jeff King wrote: >> On Sat, Mar 16, 2013 at 09:48:42AM +0100, Michael Haggerty wrote: >> >>> My patch series is nearly done. I will need another day or two to >>> review and make it submission-ready, but I wanted to give you an idea of >>> what I'm up to and I could also use your feedback on some points. I will wait for the dust to settle on Peff's "peel-ref optimization fixes" patches, then rebase my series on top of his before submitting. The rest of my series is not so urgent so I don't think the delay will be a problem. Michael -- Michael Haggerty mhagger@alum.mit.edu http://softwareswirl.blogspot.com/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Tag peeling peculiarities 2013-03-16 13:38 ` Michael Haggerty 2013-03-18 3:17 ` Michael Haggerty @ 2013-03-22 17:42 ` Jeff King 1 sibling, 0 replies; 17+ messages in thread From: Jeff King @ 2013-03-22 17:42 UTC (permalink / raw) To: Michael Haggerty; +Cc: Junio C Hamano, git discussion list [this email is from last week, and I think most of was responded to in other parts of the thread, but there were a few loose ends] On Sat, Mar 16, 2013 at 02:38:12PM +0100, Michael Haggerty wrote: > >> * Change pack-refs to use the peeled information from ref_entries if it > >> is available, rather than looking up the references again. > > > > I don't know that it matters from a performance standpoint (we don't > > really care how long pack-refs takes, as long as it is within reason, > > because it is run as part of a gc). But it would mean that any errors > > in the file are propagated from one run of pack-refs to the next. I > > don't know if we want to spend the extra time to be more robust. > > I thought about this argument too. Either way is OK with me. BTW would > it make sense to build a consistency check of peeled references into fsck? Yeah, I do think an fsck check makes sense. It should not be expensive, and if there is a realistic corruption/health check for a repo, it makes sense to me for it to be part of fsck. I do not recall many incidents of packed-refs corruption in the history of the list, but this fairly recent one comes to mind: http://thread.gmane.org/gmane.comp.version-control.git/217065 On the other hand, if it is just as cheap to rewrite the file as it is to do the health checks, I wonder if the advice should simply be "run pack-refs again" (and doing so should always start from scratch, not trusting the existing version). > > Yuck. I really wonder if repack_without_ref should literally just be > > writing out the exact same file, minus the lines for the deleted ref. > > Is there a reason we need to do any processing at all? I guess the > > answer is that you want to avoid re-parsing the current file, and just > > write it out from memory. But don't we need to refresh the memory cache > > from disk under the lock anyway? That was the pack-refs race that I > > fixed recently. > > It would certainly be thinkable to rewrite the file textually without > parsing it again. But I did it this way for two reasons: > > 1. It would be better to have one packed-refs file parser and writer > rather than two. (I'm working towards unifying the two writers, and > will continue once I understand their differences.) I see your point, though I also feel that with the right refactoring, they could share the parser. And the second writer would be mostly a pass-through. But much more compelling is reason 2... > 2. Given how peeled references were added, it seems dangerous to read, > modify, and write data that might be in a future format that we don't > entirely understand. For example, suppose a new feature is added to > mark Junio's favorite tags: > > # pack-refs with: peeled fully-peeled junios-favorites \n > ce432cac30f98b291be609a0fc974042a2156f55 refs/heads/master > 83b3dd7151e7eb0e5ac62ee03aca7e6bccaa8d07 refs/tags/evil-dogs > ^e1d04f8aad59397cbd55e72bf8a1bd75606f5350 > 7ed863a85a6ce2c4ac4476848310b8f917ab41f9 refs/tags/lolcats > ^990f856a62a24bfd56bac1f5e4581381369e4ede > ^^^junios-favorite > b0517166ae2ad92f3b17638cbdee0f04b8170d99 refs/tags/nonsense > ^4baff50551545e2b6825973ec37bcaf03edb95fe Hmm. Good point. It is tempting to make a rule like "extensions that are specific to a ref must come after the ref but before the next ref". And then the writer could simply drop any lines between the to-delete ref and the one that follows it. I think that would work OK in practice, but I am worried that it would paint us into a corner later on (after all, we do not know what extensions will be added in the future). The only thing I can think of that would break is something where a ref or its extension depends on previous entries. E.g., we start prefix-compressing the ref names to save space. Of course that would break backwards compatibility horribly anyway, so it's a no-go. But maybe some extension would want to refer to a prior ref. I dunno. -Peff ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-03-22 17:43 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-13 14:59 Tag peeling peculiarities Michael Haggerty 2013-03-13 15:34 ` Michael Haggerty 2013-03-13 17:29 ` Junio C Hamano 2013-03-13 21:58 ` Jeff King 2013-03-14 4:41 ` Michael Haggerty 2013-03-14 5:24 ` Jeff King 2013-03-14 5:32 ` Jeff King 2013-03-14 15:14 ` Junio C Hamano 2013-03-14 11:28 ` Michael Haggerty 2013-03-14 13:40 ` Jeff King 2013-03-15 5:12 ` Michael Haggerty 2013-03-15 16:28 ` Junio C Hamano 2013-03-16 8:48 ` Michael Haggerty 2013-03-16 9:34 ` Jeff King 2013-03-16 13:38 ` Michael Haggerty 2013-03-18 3:17 ` Michael Haggerty 2013-03-22 17:42 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).