git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.
Date: Sat, 02 Sep 2006 02:42:20 -0700	[thread overview]
Message-ID: <7vwt8m1u6b.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <20060902045246.GB25146@spearce.org> (Shawn Pearce's message of "Sat, 2 Sep 2006 00:52:46 -0400")

Shawn Pearce <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>> It might be worthwhile to disable revalidate reused objects
>> individually and instead scan and checksum the entire .pack file
>> when the number of objects being reused exceeds certain
>> threshold, relative to the number of objects in existing pack,
>> perhaps.
>
> Correct me if I'm wrong but didn't this revalidate check happen
> because the SHA1 of the pack was correct but there was a bad bit
> in the zlib stream?

There are two more or less independent problems and you are
correct to point out that summing the entire .pack does not
catch one class of problem while it does catch the other.

The Linus's theory goes like this:

 (1) A bit in an existing pack was damaged somehow.  It might have
     happened on the mothership machine when it was first created,
     or after it was read and copied to the notebook via rsync.

 (2) A 'repack -a -d' was done in a repository that had that
     damanged pack, it decided to reuse a deflated object from the
     existing, damaged pack, and copied that deflated
     representation into a newly created pack without validating.

 (3) The 'repack -a -d', when finishing the newly created pack,
     computed a checksum over the whole new pack and wrote the
     SHA-1 checksum in it.

Now, sha1_file() has check_packed_git_idx() that makes sure .idx
file is not corrupted; it runs the checksum over the whole .idx
data when it first maps in and makes sure the sum matches what
is recorded at the end.  So if the corruption in (1) happened to
the .idx file, 'repack -a -d' in (2) would have noticed and
refused to use the corresponding .pack.

The problem is that there is however no corresponding "checksum
over the whole file before use" for .pack file during a normal
operation of use_packed_git().  Otherwise we would have caught
the corruption in the existing pack and prevented step (2) from
propagating the error.

The idea proposed by Linus and implemented in the patch in this
thread is to mitigate this by revalidating the individual pieces
we copy in (2).  If we copy out most of what is in the existing
pack, however, it may be cheaper to do the "whole file checksum
before use" instead.

On the other hand, the "alpha particle at the right moment"
theory goes like this:

 (1) write_object() gave the buffer to sha1write_compressed();

 (2) sha1write_compressed() asked zlib to deflate the data and
     received the result in buffer pointed by void *out;

 (3) bit flipped in that buffer, after zlib finished writing
     into it with the CRC;

 (4) sha1write() wrote out the now-corrupt buffer, while
     computing the correct checksum for the end result (which is
     now corrupt).

This breakage will not be caught unless we revalidate everything
we copy out to the new pack.


-- 
VGER BF report: U 0.5

  reply	other threads:[~2006-09-02  9:42 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
     [not found] ` <20060829165811.GB21729@spearce.org>
     [not found]   ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
     [not found]     ` <20060829175819.GE21729@spearce.org>
     [not found]       ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
     [not found]         ` <20060829190548.GK21729@spearce.org>
     [not found]           ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
     [not found]             ` <20060829232007.GC22935@spearce.org>
     [not found]               ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30  1:51                 ` Mozilla .git tree Shawn Pearce
2006-08-30  2:25                   ` Shawn Pearce
2006-08-30  2:58                   ` Jon Smirl
2006-08-30  3:10                     ` Shawn Pearce
2006-08-30  3:27                       ` Jon Smirl
2006-08-30  5:53                       ` Nicolas Pitre
2006-08-30 11:42                         ` Junio C Hamano
2006-09-01  7:42                           ` Junio C Hamano
2006-09-02  1:19                             ` Shawn Pearce
2006-09-02  4:01                               ` Junio C Hamano
2006-09-02  4:39                                 ` Shawn Pearce
2006-09-02 11:06                                   ` Junio C Hamano
2006-09-02 14:20                                     ` Jon Smirl
2006-09-02 17:39                                       ` Shawn Pearce
2006-09-02 18:56                                         ` Linus Torvalds
2006-09-02 20:53                                           ` Junio C Hamano
2006-09-02 17:44                                     ` Shawn Pearce
2006-09-02  2:04                             ` Shawn Pearce
2006-09-02 11:02                               ` Junio C Hamano
2006-09-02 17:51                                 ` Shawn Pearce
2006-09-02 20:55                                   ` Junio C Hamano
2006-09-03  3:54                                     ` Shawn Pearce
2006-09-01 17:45                           ` A Large Angry SCM
2006-09-01 18:35                             ` Linus Torvalds
2006-09-01 19:56                               ` Junio C Hamano
2006-09-01 23:14                               ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02  0:23                                 ` Linus Torvalds
2006-09-02  1:39                                   ` VGER BF report? Johannes Schindelin
2006-09-02  5:58                                     ` Sam Ravnborg
2006-09-02  1:52                                   ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02  3:52                                   ` Junio C Hamano
2006-09-02  4:52                                     ` Shawn Pearce
2006-09-02  9:42                                       ` Junio C Hamano [this message]
2006-09-02 17:43                                         ` Linus Torvalds
2006-09-02 10:09                                       ` Junio C Hamano
2006-09-02 17:54                                         ` Shawn Pearce
2006-09-03 21:00                                           ` Junio C Hamano
2006-09-04  4:10                                             ` Shawn Pearce
2006-09-04  5:50                                               ` Junio C Hamano
2006-09-04  6:44                                                 ` Shawn Pearce
2006-09-04  7:39                                                   ` Junio C Hamano
2006-09-03  0:27                                         ` Linus Torvalds
2006-09-03  0:32                                           ` Junio C Hamano
2006-09-05  8:12                                           ` Junio C Hamano
2006-09-02 18:43                                     ` Linus Torvalds
2006-09-02 20:56                                       ` Junio C Hamano
2006-09-03 21:48                                       ` Junio C Hamano
2006-09-03 22:00                                         ` Linus Torvalds
2006-09-03 22:16                                           ` Linus Torvalds
2006-09-03 22:34                                           ` Junio C Hamano
2006-09-04  4:06                                             ` Junio C Hamano
2006-09-04 15:19                                               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vwt8m1u6b.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).