git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.
Date: Sat, 2 Sep 2006 00:52:46 -0400	[thread overview]
Message-ID: <20060902045246.GB25146@spearce.org> (raw)
In-Reply-To: <7vd5ae3ox2.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> It might be worthwhile to disable revalidate reused objects
> individually and instead scan and checksum the entire .pack file
> when the number of objects being reused exceeds certain
> threshold, relative to the number of objects in existing pack,
> perhaps.

Correct me if I'm wrong but didn't this revalidate check happen
because the SHA1 of the pack was correct but there was a bad bit
in the zlib stream?

If we are trying to detect such an error before removing the possibly
valid pack how are we supposed to do that if we are bypassing the
code on larger packs?


I think the better thing to do here is to not repack objects which
are already contained in very large packs.  Just leave them be.

If the pack you are about to copy an object out of is over 25 MiB,
you aren't outputting to stdout and the object isn't needed
as a delta base in the new pack then don't copy it.  Introduce a
new flag to git-pack-objects such as "--max-source-pack-size=100"
which can be used to change this 25 MiB threshold; setting it to
0 would act as "-a" does today.

This way users can repack with 'git repack -a -d' as though it were
free and much less frequently (such as once a year) combine their
medium sized packs together based on a larger maximum threshold
while still ignoring their really large historical packs.

Note that you are never bypassing the deflate validation; before
copying an object you *always* validate it is correct, even if the
source pack SHA1 is correct.  But this time consuming validation
should not be a big issue as users shouldn't repack very large
packs very frequently with this strategy.  E.g. some kernel devs
might repack once a year with --max-source-pack-size=512 (512 MiB)
but during normal use accept the 25 MiB default and the slightly
larger number of small packs that result.

-- 
Shawn.

-- 
VGER BF report: U 0.5

  reply	other threads:[~2006-09-02  8:21 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
     [not found] ` <20060829165811.GB21729@spearce.org>
     [not found]   ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
     [not found]     ` <20060829175819.GE21729@spearce.org>
     [not found]       ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
     [not found]         ` <20060829190548.GK21729@spearce.org>
     [not found]           ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
     [not found]             ` <20060829232007.GC22935@spearce.org>
     [not found]               ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30  1:51                 ` Mozilla .git tree Shawn Pearce
2006-08-30  2:25                   ` Shawn Pearce
2006-08-30  2:58                   ` Jon Smirl
2006-08-30  3:10                     ` Shawn Pearce
2006-08-30  3:27                       ` Jon Smirl
2006-08-30  5:53                       ` Nicolas Pitre
2006-08-30 11:42                         ` Junio C Hamano
2006-09-01  7:42                           ` Junio C Hamano
2006-09-02  1:19                             ` Shawn Pearce
2006-09-02  4:01                               ` Junio C Hamano
2006-09-02  4:39                                 ` Shawn Pearce
2006-09-02 11:06                                   ` Junio C Hamano
2006-09-02 14:20                                     ` Jon Smirl
2006-09-02 17:39                                       ` Shawn Pearce
2006-09-02 18:56                                         ` Linus Torvalds
2006-09-02 20:53                                           ` Junio C Hamano
2006-09-02 17:44                                     ` Shawn Pearce
2006-09-02  2:04                             ` Shawn Pearce
2006-09-02 11:02                               ` Junio C Hamano
2006-09-02 17:51                                 ` Shawn Pearce
2006-09-02 20:55                                   ` Junio C Hamano
2006-09-03  3:54                                     ` Shawn Pearce
2006-09-01 17:45                           ` A Large Angry SCM
2006-09-01 18:35                             ` Linus Torvalds
2006-09-01 19:56                               ` Junio C Hamano
2006-09-01 23:14                               ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02  0:23                                 ` Linus Torvalds
2006-09-02  1:39                                   ` VGER BF report? Johannes Schindelin
2006-09-02  5:58                                     ` Sam Ravnborg
2006-09-02  1:52                                   ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02  3:52                                   ` Junio C Hamano
2006-09-02  4:52                                     ` Shawn Pearce [this message]
2006-09-02  9:42                                       ` Junio C Hamano
2006-09-02 17:43                                         ` Linus Torvalds
2006-09-02 10:09                                       ` Junio C Hamano
2006-09-02 17:54                                         ` Shawn Pearce
2006-09-03 21:00                                           ` Junio C Hamano
2006-09-04  4:10                                             ` Shawn Pearce
2006-09-04  5:50                                               ` Junio C Hamano
2006-09-04  6:44                                                 ` Shawn Pearce
2006-09-04  7:39                                                   ` Junio C Hamano
2006-09-03  0:27                                         ` Linus Torvalds
2006-09-03  0:32                                           ` Junio C Hamano
2006-09-05  8:12                                           ` Junio C Hamano
2006-09-02 18:43                                     ` Linus Torvalds
2006-09-02 20:56                                       ` Junio C Hamano
2006-09-03 21:48                                       ` Junio C Hamano
2006-09-03 22:00                                         ` Linus Torvalds
2006-09-03 22:16                                           ` Linus Torvalds
2006-09-03 22:34                                           ` Junio C Hamano
2006-09-04  4:06                                             ` Junio C Hamano
2006-09-04 15:19                                               ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060902045246.GB25146@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).