All of lore.kernel.org
 help / color / mirror / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Nicolas Pitre <nico@cam.org>, Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Wed, 13 Feb 2008 09:11:29 +0100	[thread overview]
Message-ID: <20080213081128.GA27730@auto.tuwien.ac.at> (raw)
In-Reply-To: <20080213074209.GG24004@spearce.org>

On Wed, Feb 13, 2008 at 02:42:09AM -0500, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > I think this is a good idea to always have some sanity checks on any 
> > incoming objects so to make sure they're well formed and valid before 
> > giving them a SHA1 value, and bail out as soon as any error is found.  
> 
> When we get the raw data for an object so we can compute its SHA-1
> and/or write its loose object to disk we should first verify its
> content is sane, then do the SHA-1/store loose.
> 
[...]
> > As to making sure those objects are well connected... well this is a 
> > technically different issue entirely, and I wonder if a special mode to 
> > fsck might not be a better solution.
> 
> Nah, just do what quickfetch does in builtin-fetch.c, but run it
> in receive-pack, between unpack() and execute_commands():
> 
> 	rev-list --quiet --objects $new... --not --all
> 
> If it aborts, reachability testing failed and the push is rejected
> without updating any refs.  Yes your repository now has objects
> that are missing things, but none of those are considered to be
> reachable, so this isn't a big deal.  They will get cleaned up on
> the next `gc --prune`, whenever that is.

This would mean, that we must make git-rev-list and git-pack-objects
not segfault on incorrect links between objects.

> In this configuration (--quiet) rev-list tries to be pretty low
> on its memory usage, it doesn't save buffers, etc.  Further since
> everything that is already considered reachable is not interesting,
> we are only doing a walk over the objects that we just received,
> not our entire ODB.  Its also after index-pack exited, so we just
> freed up a good chunk of memory.
> 
> Rememeber we are talking about receive-pack here.  The cost on
> the to perform the rev-list is lower than the cost will be to pack
> these objects for distribution back to just one client.  Since this
> is a server of some sorts (otherwise why did you push here?), odds
> are its going to be doing a lot of packing requests for clients to
> receive these newly uploaded objects by the native git protocol.
> This new rev-list is nothing compared to that already existing load.
> And if your OS is any good the just created .idx and .pack is still
> in OS buffer cache, so there shouldn't be any additional disk IO.
> 
> Yes, we could make this optional in receive-pack, but really I don't
> see a reason to.  Just run it.  The client shouldn't be giving us
> unreachable crap.

Looks sane to me.

mfg Martin Kögler

  reply	other threads:[~2008-02-13  8:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11  0:00 ` Junio C Hamano
2008-02-11  0:33   ` Nicolas Pitre
2008-02-11 19:56     ` Martin Koegler
2008-02-11 20:41       ` Nicolas Pitre
2008-02-11 21:58         ` Martin Koegler
2008-02-12 16:02           ` Nicolas Pitre
2008-02-12 19:04             ` Martin Koegler
2008-02-12 20:22               ` Nicolas Pitre
2008-02-12 21:38                 ` Martin Koegler
2008-02-12 21:51                   ` Nicolas Pitre
2008-02-13  6:20                     ` Shawn O. Pearce
2008-02-13  7:39                       ` Martin Koegler
2008-02-14  9:00                         ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07                           ` Martin Koegler
2008-02-13  7:42             ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13  8:11               ` Martin Koegler [this message]
2008-02-13 12:01                 ` Johannes Schindelin
2008-02-14  6:16                   ` Shawn O. Pearce
2008-02-14 19:04                   ` Martin Koegler
2008-02-15  0:06                     ` Johannes Schindelin
2008-02-15  7:18                       ` Martin Koegler
2008-02-12  7:20   ` Martin Koegler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080213081128.GA27730@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.