From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Nicolas Pitre <nico@cam.org>, Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Wed, 13 Feb 2008 09:11:29 +0100 [thread overview]
Message-ID: <20080213081128.GA27730@auto.tuwien.ac.at> (raw)
In-Reply-To: <20080213074209.GG24004@spearce.org>
On Wed, Feb 13, 2008 at 02:42:09AM -0500, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > I think this is a good idea to always have some sanity checks on any
> > incoming objects so to make sure they're well formed and valid before
> > giving them a SHA1 value, and bail out as soon as any error is found.
>
> When we get the raw data for an object so we can compute its SHA-1
> and/or write its loose object to disk we should first verify its
> content is sane, then do the SHA-1/store loose.
>
[...]
> > As to making sure those objects are well connected... well this is a
> > technically different issue entirely, and I wonder if a special mode to
> > fsck might not be a better solution.
>
> Nah, just do what quickfetch does in builtin-fetch.c, but run it
> in receive-pack, between unpack() and execute_commands():
>
> rev-list --quiet --objects $new... --not --all
>
> If it aborts, reachability testing failed and the push is rejected
> without updating any refs. Yes your repository now has objects
> that are missing things, but none of those are considered to be
> reachable, so this isn't a big deal. They will get cleaned up on
> the next `gc --prune`, whenever that is.
This would mean, that we must make git-rev-list and git-pack-objects
not segfault on incorrect links between objects.
> In this configuration (--quiet) rev-list tries to be pretty low
> on its memory usage, it doesn't save buffers, etc. Further since
> everything that is already considered reachable is not interesting,
> we are only doing a walk over the objects that we just received,
> not our entire ODB. Its also after index-pack exited, so we just
> freed up a good chunk of memory.
>
> Rememeber we are talking about receive-pack here. The cost on
> the to perform the rev-list is lower than the cost will be to pack
> these objects for distribution back to just one client. Since this
> is a server of some sorts (otherwise why did you push here?), odds
> are its going to be doing a lot of packing requests for clients to
> receive these newly uploaded objects by the native git protocol.
> This new rev-list is nothing compared to that already existing load.
> And if your OS is any good the just created .idx and .pack is still
> in OS buffer cache, so there shouldn't be any additional disk IO.
>
> Yes, we could make this optional in receive-pack, but really I don't
> see a reason to. Just run it. The client shouldn't be giving us
> unreachable crap.
Looks sane to me.
mfg Martin Kögler
next prev parent reply other threads:[~2008-02-13 8:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre
2008-02-12 21:38 ` Martin Koegler
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler [this message]
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler
2008-02-12 7:20 ` Martin Koegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080213081128.GA27730@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).