git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Funnies with "git fetch"
@ 2011-09-01 17:53 Junio C Hamano
  2011-09-01 22:42 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Junio C Hamano @ 2011-09-01 17:53 UTC (permalink / raw)
  To: git

I just did this in an empty directory.

    $ git init src
    $ cd src
    $ echo hello >greetings ; git add . ; git commit -m greetings
    $ S=$(git rev-parse :greetings | sed -e 's|^..|&/|')
    $ X=$(echo bye | git hash-object -w --stdin | sed -e 's|^..|&/|')
    $ mv -f .git/objects/$X .git/objects/$S

The tip commit _thinks_ it has "greetings" that contains "hello", but
somebody replaced it with a corrupt "bye" that does not match self
integrity.

    $ git fsck
    error: sha1 mismatch ce013625030ba8dba906f756967f9e9ca394464a

    error: ce013625030ba8dba906f756967f9e9ca394464a: object corrupt or missing
    missing blob ce013625030ba8dba906f756967f9e9ca394464a

The "hello" blob is ce0136, and the tree contained in HEAD expects "hello"
in that loose object file, but notices the contents do not match the
filename.

So far, so good. Let's see what others see when they interact with this
repository.

cd ../
git init dst
cd dst
git config receive.fsckobjects true
git remote add origin ../src
git config branch.master.remote origin
git config branch.master.merge refs/heads/master
git fetch
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From ../src
     * [new branch]      master     -> origin/master

Oops? If we run "fsck" at this point, we would notice the breakage:

    $ git fsck
    notice: HEAD points to an unborn branch (master)
    broken link from    tree 1c93b84c9756b083e5751db1f9ffa7f80ac667e2
                  to    blob ce013625030ba8dba906f756967f9e9ca394464a
    missing blob ce013625030ba8dba906f756967f9e9ca394464a
    dangling blob b023018cabc396e7692c70bbf5784a93d3f738ab

Here, b02301 is the true identity of the "bye" blob the src repository
crafted and tried to fool us into believing it is "hello".  We can see
that the object transfer gave three objects, and because we only propagate
the contents and have the receiving end compute the object names from the
data, we received b02301 but not ce0136.

    $ ls .git/objects/??/?*
    .git/objects/1c/93b84c9756b083e5751db1f9ffa7f80ac667e2
    .git/objects/61/5d8c76daef6744635c87fb312a76a5ec7462ea
    .git/objects/b0/23018cabc396e7692c70bbf5784a93d3f738ab

As a side note, if we did "git pull" instead of "git fetch", we would have
also noticed the breakage, like so:

    $ git pull
    remote: Counting objects: 3, done.
    remote: Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From ../src
     * [new branch]      master     -> origin/master
    error: unable to find ce013625030ba8dba906f756967f9e9ca394464a
    error: unable to read sha1 file of greetings (ce013625030ba...)

But the straight "fetch" did not notice anything fishy going on. Shouldn't
we have?  Even though we may be reasonably safe, unpack-objects should be
able to do better, especially under receive.fsckobjects option.

Also as a side note, if we set 

    $ git config fetch.unpacklimit 1

before we run this "git fetch", we end up storing a single pack, whose
contents are the same three objects above (as expected), and we do not get
any indication of an error from the command.

I think the breakages are:

 - The sending side does not give any indication that it _wanted_ to send
   ce0136 but couldn't, and ended up sending another object;

 - The pack data sent over the wire was self consistent (no breakage here)
   and sent three well-formed objects, but it was inconsistent with
   respect to what history was being transferred (breakage is here);

 - The receiving end did not notice the inconsistency.

The first one is of the lower priority, as the client side should be able
to notice an upstream with corruption in any case. Perhaps after asking
for objects between "have" and "want", "git fetch" should verify that it
can fully walk the subhistory that was supposed to be transferred down to
the blob level?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-09-05  2:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-01 17:53 Funnies with "git fetch" Junio C Hamano
2011-09-01 22:42 ` Junio C Hamano
2011-09-01 23:31   ` Jeff King
2011-09-02  3:09     ` Junio C Hamano
2011-09-02  3:28       ` Junio C Hamano
2011-09-02  5:03       ` Jeff King
2011-09-01 22:43 ` [PATCH 0/3] Verify the objects fetch obtained before updating ref Junio C Hamano
2011-09-01 22:43   ` [PATCH 1/3] list-objects: pass callback data to show_objects() Junio C Hamano
2011-09-01 22:43   ` [PATCH 2/3] rev-list --verify-object Junio C Hamano
2011-09-01 22:43   ` [PATCH 3/3] fetch: verify we have everything we need before updating our ref Junio C Hamano
2011-09-02  3:55     ` Nguyen Thai Ngoc Duy
2011-09-02  4:25       ` Junio C Hamano
2011-09-02 23:14         ` Junio C Hamano
2011-09-04 19:15 ` Funnies with "git fetch" Junio C Hamano
2011-09-05  2:21   ` [PATCH 0/3] Add fetch.fsckobjects Junio C Hamano
2011-09-05  2:21     ` [PATCH 1/3] fetch.fsckobjects: verify downloaded objects Junio C Hamano
2011-09-05  2:21     ` [PATCH 2/3] transfer.fsckobjects: unify fetch/receive.fsckobjects Junio C Hamano
2011-09-05  2:21     ` [PATCH 3/3] test: fetch/receive with fsckobjects Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).