From: Nicolas Pitre <nico@cam.org>
To: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Tue, 12 Feb 2008 15:22:17 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.1.00.0802121507310.2732@xanadu.home> (raw)
In-Reply-To: <20080212190411.GA23837@auto.tuwien.ac.at>
On Tue, 12 Feb 2008, Martin Koegler wrote:
> On Tue, Feb 12, 2008 at 11:02:06AM -0500, Nicolas Pitre wrote:
> > I think this is a good idea to always have some sanity checks on any
> > incoming objects so to make sure they're well formed and valid before
> > giving them a SHA1 value, and bail out as soon as any error is found.
> > From my understanding that's what your patch is doing, right? (sorry I
> > can't find them in my mailbox anymore).
>
> Yes. (=>http://marc.info/?l=git&m=120266631524947&w=2)
>
> > This can be done as objects are
> > coming in just fine and requires no extra memory, and I would say this
> > should be done unconditionally all the time. After all, the Git
> > coherency model is based on the SHA1 checksuming, and therefore it is a
> > good idea to never validate any malformed objects with a SHA1. So I'm
> > all in favor of such validation always performed in index-pack and
> > unpack-objects.
>
> We will need some additional memory for struct blob/tree/tag/commit
> even for this check.
Why so?
Each received object is stored in memory when received, so why can't you
simply validate it in place? No need to keep trace of them afterward.
> > As to making sure those objects are well connected... well this is a
> > technically different issue entirely, and I wonder if a special mode to
> > fsck might not be a better solution. For example, fsck could be made to
> > validate object connectivity, starting from the new ref(s), and stopping
> > object walking as soon as a reference to an object not included in the
> > newly received pack is encountered. This could be run from some hook to
> > decide whether or not to update the new refs, and to delete the pack
> > otherwise.
>
> Do you really think, that this will need less memory? fsck loads first
> all objects and then verifies their connections.
Not all objects otherwise I wouldn't even be able to run it.
My point is that you can have fsck load only objects contained in the
received pack (you can use the pack index to load them) and assume
connectivity is good whenever an object in the pack reference an
existing object outside of the pack. At least this doesn't need to
happen in parallel with pack indexing.
Nicolas
next prev parent reply other threads:[~2008-02-12 20:22 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre [this message]
2008-02-12 21:38 ` Martin Koegler
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler
2008-02-12 7:20 ` Martin Koegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.1.00.0802121507310.2732@xanadu.home \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mkoegler@auto.tuwien.ac.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).