From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: Funnies with "git fetch"
Date: Thu, 1 Sep 2011 19:31:08 -0400 [thread overview]
Message-ID: <20110901233108.GA9339@sigill.intra.peff.net> (raw)
In-Reply-To: <7vpqjjnau1.fsf@alter.siamese.dyndns.org>
On Thu, Sep 01, 2011 at 03:42:46PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I think the breakages are:
> >
> > - The sending side does not give any indication that it _wanted_ to send
> > ce0136 but couldn't, and ended up sending another object;
This isn't fixable without the server re-hashing every outgoing object,
though, is it? It thinks it has ce0136, but the data is corrupted. The
only way to detect that is to rehash each object that we send. That's
pretty expensive when you consider decompression and delta
reconstruction.
Besides which, it doesn't help many malicious cases at all. It will
handle disk corruption or a malicious object-db tweak. But it does
nothing against an attacker who can trojan git, or who can spoof the
network session (which is probably hard to do if its straight ssh, but
keep in mind that traffic may be going through cleartext proxies).
So it seems that the receiving side is the only sensible place to put
such checks. It catches problems at more levels, and it can generally
afford the extra processing load (under the assumption that pushes to
servers are smaller and less frequent than fetches).
> > - The pack data sent over the wire was self consistent (no breakage here)
> > and sent three well-formed objects, but it was inconsistent with
> > respect to what history was being transferred (breakage is here);
I don't think this matters. We have thin packs, after all. What goes on
the wire doesn't have to be sensible unto itself; it's about whether
the receiving end can do something useful with it.
> > - The receiving end did not notice the inconsistency.
> >
> > The first one is of the lower priority, as the client side should be able
> > to notice an upstream with corruption in any case. Perhaps after asking
> > for objects between "have" and "want", "git fetch" should verify that it
> > can fully walk the subhistory that was supposed to be transferred down to
> > the blob level?
That seems like a sensible improvement to me.
> So I have a series to fix the latter "more important" half I'll be sending
> out in this thread.
If I understand correctly, your series is just about checking that we
have newly-referenced blobs. We were already checking commits and trees,
and we should already be hashing individual objects when we index the
pack. Right?
In that case, the performance change should be negligible.
-Peff
next prev parent reply other threads:[~2011-09-01 23:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-01 17:53 Funnies with "git fetch" Junio C Hamano
2011-09-01 22:42 ` Junio C Hamano
2011-09-01 23:31 ` Jeff King [this message]
2011-09-02 3:09 ` Junio C Hamano
2011-09-02 3:28 ` Junio C Hamano
2011-09-02 5:03 ` Jeff King
2011-09-01 22:43 ` [PATCH 0/3] Verify the objects fetch obtained before updating ref Junio C Hamano
2011-09-01 22:43 ` [PATCH 1/3] list-objects: pass callback data to show_objects() Junio C Hamano
2011-09-01 22:43 ` [PATCH 2/3] rev-list --verify-object Junio C Hamano
2011-09-01 22:43 ` [PATCH 3/3] fetch: verify we have everything we need before updating our ref Junio C Hamano
2011-09-02 3:55 ` Nguyen Thai Ngoc Duy
2011-09-02 4:25 ` Junio C Hamano
2011-09-02 23:14 ` Junio C Hamano
2011-09-04 19:15 ` Funnies with "git fetch" Junio C Hamano
2011-09-05 2:21 ` [PATCH 0/3] Add fetch.fsckobjects Junio C Hamano
2011-09-05 2:21 ` [PATCH 1/3] fetch.fsckobjects: verify downloaded objects Junio C Hamano
2011-09-05 2:21 ` [PATCH 2/3] transfer.fsckobjects: unify fetch/receive.fsckobjects Junio C Hamano
2011-09-05 2:21 ` [PATCH 3/3] test: fetch/receive with fsckobjects Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110901233108.GA9339@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).