From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
PJ Hyett <pjhyett@gmail.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>,
git@vger.kernel.org
Subject: Re: Bad objects error since upgrading GitHub servers to 1.6.1
Date: Wed, 28 Jan 2009 08:09:00 -0800 [thread overview]
Message-ID: <20090128160900.GJ1321@spearce.org> (raw)
In-Reply-To: <7vd4e7x5ov.fsf@gitster.siamese.dyndns.org>
Junio C Hamano <gitster@pobox.com> wrote:
>
> I've been toying with an idea for an alternative solution, and need
> somebody competent to bounce it around with.
Heh, then we need to wait for Nico... :-)
> pack-objects ends up doing eventually
>
> rev-list --objects $send1 $send2 $send3 ... --not $have1 $have2 ...
>
> which lists commits and associated objects reachable from $sendN,
> excluding the ones that are reachable from $haveN.
>
> The tentative solution Björn Steinbrink and I came up with excludes
> missing commit from $haveN to avoid rev-list machinery to barf, but it
> violates the ref-object contract as I explained to Björn in my other
> message.
Oh, OK, I now _finally_ understand what you were trying to say
by the reachability thing. I kept scratching my head trying to
understand you, and was going to say something stupid on list;
but waited because I just didn't get what the big deal was...
Its the crash in rev-list that you were worried about.
> Checking if each commit is reachable from any of the refs is quite
> expensive, and it would especially be so if it is done once per ".have"
> and real ref we receive from the other end.
Yup.
> An alternative is to realize that rev-list traversal already does
> something quite similar to what is needed to prove if these ".have"s are
> reachable from refs when listing the reachable objects. This computation
> is what it needs to do anyway, so if we teach rev-list to ignore missing
> or broken chain while traversing negative refs, we do not have to incur
> any overhead over existing code.
EXACTLY.
JGit does this.
The functional equivilant of rev-list in JGit will by default
throw an exception if any object is missing when we try to walk it.
That includes things we've painted UNINTERESTING, as it is a sure
sign of repository corruption.
However; our equivilant of pack-objects can toggle what you are
calling "ignore-missing-negative" when it starts enumeration.
Any UNINTERESTING object which is missing or failed to parse is
simply tossed aside. Yes, the pack may be larger than necessary
like in Peff's example of:
Q-R
/
D--E
\
A-C
If the other side has C reachable, we are pushing R, and we have
C but are missing A, we'll "over push" D-E, but its still a clean
and valid push. Its no worse than we were before the ".have" came
about, or if C hadn't been downloaded locally at all. (Of course
your tell-me-more extension would help fix this over-push, but lets
not get off topic.)
IMHO, this corruption of A is harmless if C isn't reachable.
It isn't really local corruption unless C was reachable by a ref.
But we don't tend to see much corruption like that, and if it did
exist, it would show up during *other* operations that access a
larger set of local refs, such as "git gc".
> I have a mild suspicion that it may even be the right thing to ignore them
> unconditionally, and it might even match the intention of Linus's original
> code. That would make many hunks in this patch much simpler.
I don't think its right to ignore broken UNINTERESTING chains all
of the time. Today we would see fatal errors if I asked for
git log R ^C
and A was missing, but R and C are both local refs. I still want
to see that fatal error. Its a local corruption that should be
raised quickly to the user. In fact by A missing we'd compute the
wrong result and produce D-E too, which is wrong.
IMHO, the *only* time this missing uninteresting A is safe is
during send-pack, upload-pack, or bundle creation, where you are
bringing the other side up to R by transferring any amount of data
necessary to reach that goal. Which is why JGit enables this.
(Though at the API level we do let the caller flag if they want
the error to be fatal instead, but AFAIK nobody sets it for "fatal".)
FWIW, Linus' most recent message on this thread about hoisting the
UNINTERESTING test up sooner makes sense too.
> The evidences behind this suspicion are found in a handful of places in
> revision.c. mark_blob_uninteresting() does not complain if the caller
> fails to find the blob. mark_tree_uninteresting() does not, either.
> mark_parents_uninteresting() does not, either, and it even has a comment
> that strongly suggests the original intention was not to care about
> missing UNINTERESTING objects.
That feels wrong to me... given the "git log R ^C" example I give above.
--
Shawn.
next prev parent reply other threads:[~2009-01-28 16:10 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-27 23:04 Bad objects error since upgrading GitHub servers to 1.6.1 PJ Hyett
2009-01-27 23:10 ` PJ Hyett
2009-01-27 23:37 ` Johannes Schindelin
2009-01-27 23:39 ` Shawn O. Pearce
2009-01-27 23:51 ` Junio C Hamano
2009-01-28 0:15 ` PJ Hyett
2009-01-28 0:34 ` PJ Hyett
2009-01-28 1:06 ` Junio C Hamano
2009-01-28 1:32 ` Junio C Hamano
2009-01-28 1:38 ` [PATCH] send-pack: Filter unknown commits from alternates of the remote Björn Steinbrink
2009-01-28 1:47 ` Junio C Hamano
2009-01-28 3:33 ` Junio C Hamano
2009-01-28 3:58 ` Björn Steinbrink
2009-01-28 4:13 ` Junio C Hamano
2009-01-28 4:32 ` Junio C Hamano
2009-01-28 1:44 ` Bad objects error since upgrading GitHub servers to 1.6.1 Junio C Hamano
2009-01-28 1:57 ` PJ Hyett
2009-01-28 2:02 ` Shawn O. Pearce
2009-01-28 3:09 ` Junio C Hamano
2009-01-28 3:30 ` Shawn O. Pearce
2009-01-28 3:52 ` Stephen Bannasch
2009-01-28 3:57 ` Shawn O. Pearce
2009-01-28 5:44 ` Junio C Hamano
2009-01-28 4:38 ` Junio C Hamano
2009-01-28 4:41 ` Shawn O. Pearce
2009-01-28 7:14 ` Junio C Hamano
2009-01-28 7:41 ` Junio C Hamano
2009-01-28 7:51 ` [PATCH 1/2] send-pack: do not send unknown object name from ".have" to pack-objects Junio C Hamano
2009-01-28 15:45 ` Bad objects error since upgrading GitHub servers to 1.6.1 Linus Torvalds
2009-01-28 19:00 ` Junio C Hamano
2009-01-28 7:55 ` Jeff King
2009-01-28 8:05 ` Junio C Hamano
2009-01-28 8:17 ` Jeff King
2009-01-28 16:16 ` Shawn O. Pearce
2009-01-28 18:16 ` Jeff King
2009-01-28 18:26 ` Junio C Hamano
2009-01-28 8:22 ` Junio C Hamano
2009-01-28 9:24 ` Jeff King
2009-01-28 16:09 ` Shawn O. Pearce [this message]
2009-01-28 16:38 ` Nicolas Pitre
2009-01-28 18:11 ` Jeff King
2009-01-28 1:00 ` Linus Torvalds
2009-01-28 1:15 ` Björn Steinbrink
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090128160900.GJ1321@spearce.org \
--to=spearce@spearce.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pjhyett@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).