From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Nicolas Pitre <nico@cam.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Tue, 12 Feb 2008 22:38:58 +0100 [thread overview]
Message-ID: <20080212213858.GA29151@auto.tuwien.ac.at> (raw)
In-Reply-To: <alpine.LFD.1.00.0802121507310.2732@xanadu.home>
On Tue, Feb 12, 2008 at 03:22:17PM -0500, Nicolas Pitre wrote:
> On Tue, 12 Feb 2008, Martin Koegler wrote:
> > We will need some additional memory for struct blob/tree/tag/commit
> > even for this check.
>
> Why so?
>
> Each received object is stored in memory when received, so why can't you
> simply validate it in place? No need to keep trace of them afterward.
Freeing the data is not my problem.
Many validations are in parse_XXX_buffer, which are also used by
fsck. This returns a struct commit/tree/tag/blob.
I have not found any code in git to free them.
Same for pack-objects, e.g. add_objects_in_unpacked_packs allocates
many struct object via lookup_unknown_object. As far as I understand
the code, they are never freed, even if they are not needed later.
> > > As to making sure those objects are well connected... well this is a
> > > technically different issue entirely, and I wonder if a special mode to
> > > fsck might not be a better solution. For example, fsck could be made to
> > > validate object connectivity, starting from the new ref(s), and stopping
> > > object walking as soon as a reference to an object not included in the
> > > newly received pack is encountered. This could be run from some hook to
> > > decide whether or not to update the new refs, and to delete the pack
> > > otherwise.
> >
> > Do you really think, that this will need less memory? fsck loads first
> > all objects and then verifies their connections.
>
> Not all objects otherwise I wouldn't even be able to run it.
It only loads all objects, it checks (eg. no objects from the pack file
by default). After loading a object, it frees the content of object,
but keeps the parsed information in memory for the connectivity check.
> My point is that you can have fsck load only objects contained in the
> received pack (you can use the pack index to load them) and assume
> connectivity is good whenever an object in the pack reference an
> existing object outside of the pack. At least this doesn't need to
> happen in parallel with pack indexing.
So you propose the following:
1) run index-pack (write pack file + index in repository)
2) fsck pack file
3) update refs
This looks to be very prone to race conditions, if multiple
pushes run concurrently.
To be on the safe side, the checks must be finished, before a pack/object
becomes part of the repository. We must check the reachability before
index-pack writes the index.
In my opinion, separating the code for the reachability check from
index-pack would complicate things. It would not safe memory, as the
memory would be used by two processes instead of one.
You don't want to increase the resource usage of index-pack. If
--strict is not passed, I implemented this.
For --strict, it is not avoidable, that index-pack will use more
memory and cpu time.
mfg Martin Kögler
next prev parent reply other threads:[~2008-02-12 21:39 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre
2008-02-12 21:38 ` Martin Koegler [this message]
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler
2008-02-12 7:20 ` Martin Koegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080212213858.GA29151@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).