From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Tue, 12 Feb 2008 08:20:30 +0100 [thread overview]
Message-ID: <20080212072030.GA20857@auto.tuwien.ac.at> (raw)
In-Reply-To: <7vmyq8cqfn.fsf@gitster.siamese.dyndns.org>
On Sun, Feb 10, 2008 at 04:00:44PM -0800, Junio C Hamano wrote:
> mkoegler@auto.tuwien.ac.at (Martin Koegler) writes:
> > * add --strict option to unpack-objects (patch 1,8,9)
> > * add --strict option to index-pack (patch 8,10)
> >
> > Same as for unpack-objects, but without writting objects.
> >
> > * add config option for receive pack to enable checking (patch 11)
>
> If this patch is any good, I strongly suspect it should not be
> just the default but should always be on. IOW no config is
> necessary. That would make the series a bit simpler, I guess.
Not very much. In patch 11, we would not need to parse a config
variable and could pass --strict option unconditional to
index-pack/unpack-objects.
For the moment, I would like to keep it as an option (enabled by
default). If somebody has a repository with a totally broken history,
he would not be able to push any more, if there is no way to disable
it.
> > From 76e86fe55345e633c910d6b8fe166e27c23c5aaf Mon Sep 17 00:00:00 2001
> > From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > Date: Fri, 8 Feb 2008 08:51:38 +0100
> > Subject: [PATCH 01/12] unpack-object: cache for non written objects
> >
> > Preventing objects with broken links entering the repository
> > means, that write of some objects must be delayed.
> >
> > This patch adds a cache to keep the object data in memory. The delta
> > resolving code must also search in the cache.
>
> I have to wonder what the memory pressure in real-life usage
> will be like.
>
> When an object is proven to be good, we should be able to free
> its buffer after writing it out, but would that be a good enough
> optimization we can make later on this code to keep its memory
> consumption manageable?
This code is only used by unpack-objects, which is used for small
packs. It only caches metadata (tree,commit,tag), no blobs. So the
memory usage should not be a problem.
> > diff --git a/fsck.c b/fsck.c
> > new file mode 100644
> > index 0000000..089f775
> > --- /dev/null
> > +++ b/fsck.c
> > @@ -0,0 +1,84 @@
> > +#include "cache.h"
> > +#include "object.h"
> > +#include "blob.h"
> > +#include "tree.h"
> > +#include "tree-walk.h"
> > +#include "commit.h"
> > +#include "tag.h"
> > +#include "fsck.h"
> > +
> > +static int fsck_walk_tree(struct tree* tree, fsck_walk_func walk, void* data)
> > +{
> > + struct tree_desc desc;
> > + struct name_entry entry;
> > +
> > + if(parse_tree(tree))
> > + return -1;
>
> It's a bit hard to see how these new set of functions relate to
> the original code in this patch series, because you add the new
> things that are initially not used anywhere independently, start
> referring to them in a separate patch and then remove the old
> related functions that are now unused. This style makes
> reviewing easier and harder at the same time...
Will try to restructure the patches.
> > From 80b22c3f2c3e13c207790a49646020c55b34bba7 Mon Sep 17 00:00:00 2001
> > From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > Date: Fri, 8 Feb 2008 09:01:50 +0100
> > Subject: [PATCH 03/12] fsck: move mark-reachable to fsck_walk
> >
> > Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > ---
> > builtin-fsck.c | 34 ++++++++++++++++++++++++----------
> > 1 files changed, 24 insertions(+), 10 deletions(-)
> > ...
> > +static int mark_object(struct object* obj, int type, void* data)
> > +{
> > + if (!obj)
> > + return 0;
> > + if (obj->flags & REACHABLE)
> > + return 0;
> > + obj->flags |= REACHABLE;
> > + if (!obj->parsed)
> > + return 0;
> > + return fsck_walk(obj, mark_object, data);
> > +}
>
> Hmm. The return value 0 means Ok and negative is error? The
> reason we can say success if obj is NULL or it is not parsed yet
> is because...?
In mark_objects, I don't care for the results. Its should only
mark all reachable objects (without crashing).
> > @@ -326,8 +344,6 @@ static int fsck_tree(struct tree *item)
> > o_name = name;
> > o_sha1 = sha1;
> > }
> > - free(item->buffer);
> > - item->buffer = NULL;
>
> Hmm. The reason you still need the buffer after you checked the
> contents of the tree in the loop is because you haven't actually
> checked the referents are Ok. But I do not see a corresponding
> free that releases this memory after you are actually done with
> the verification with fsck_walk() yet, so we leak this in the
> meantime?
The tree is traversed multiple times:
* to mark reachable objects
* to mark it's childs used
* to check for broken links
The last use of the tree content is, after all objects are already in
memory.
> > @@ -375,8 +391,6 @@ static int fsck_commit(struct commit *commit)
> > }
> > if (memcmp(buffer, "author ", 7))
> > return objerror(&commit->object, "invalid format - expected 'author' line");
> > - free(commit->buffer);
> > - commit->buffer = NULL;
> > if (!commit->tree)
> > return objerror(&commit->object, "could not load commit's tree %s", tree_sha1);
> > if (!commit->parents && show_root)
>
> Likewise.
Will change.
> > From ce43251ef71962ff64fe138f1295c405ef6aaf65 Mon Sep 17 00:00:00 2001
> > From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > Date: Fri, 8 Feb 2008 09:04:08 +0100
> > Subject: [PATCH 04/12] fsck: move reachable object check to fsck_walk
> >
> > It handles NULL pointers in object references without crashing.
> >
> > Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > ---
> > builtin-fsck.c | 49 +++++++++++++++++++++++++++++--------------------
> > 1 files changed, 29 insertions(+), 20 deletions(-)
> >
> > diff --git a/builtin-fsck.c b/builtin-fsck.c
> > index 49e96ff..2c1e10f 100644
> > --- a/builtin-fsck.c
> > +++ b/builtin-fsck.c
> > @@ -81,13 +81,39 @@ static int objwarning(struct object *obj, const char *err, ...)
> > return -1;
> > }
> >
> > +static int check_reachable_object_childs(struct object *obj, int type, void *data)
> > +{
> > + struct object *parent = data;
> > + if (!obj) {
> > + printf("broken link from %7s %s\n",
> > + typename(parent->type), sha1_to_hex(parent->sha1));
> > + printf("broken link from %7s %s\n",
> > + (type==OBJ_ANY?"unknown":typename(type)), "unknown");
>
> Hmm? I am not sure what this part is reporting...
If eg. the sha1 of a commit is stored as tree in a commit, you will
find a null pointer in struct commit->tree. Such things will hit this
check.
> > From ee11f771be1ef1c29725cb56ab3eb8dfe61ca25a Mon Sep 17 00:00:00 2001
> > From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > Date: Fri, 8 Feb 2008 09:07:33 +0100
> > Subject: [PATCH 06/12] create a common object checker code out of fsck
> >
> > The function provides a callback for reporting errors.
>
> The same "add unused new stuff independently, later use it and
> then finally remove now unused old stuff" pattern is here. I am
> neutral to that patch style but it is a bit harder to see what
> is going on.
>
> Most of the changes seem to be straight and sane copy-and-paste though.
I'll merge this patch with the next patch.
> > From a8db4e754e717bac0b2462333d4145eac3452099 Mon Sep 17 00:00:00 2001
> > From: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> > Date: Fri, 8 Feb 2008 09:14:14 +0100
> > Subject: [PATCH 09/12] unpack-objects: prevent writing of inconsistent objects
> >
> > This patch introduces a strict mode, which ensures that:
> > - no malformed object will be written
> > - no object with broken links will be written
> >
> > The patch ensures this by delaying the write of all non blob object.
> > These object are written, after all objects they link to are written.
> >
> > An error can only result in unreferenced objects.
>
> > diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
> > index f18c7e8..3e906e4 100644
> > --- a/builtin-unpack-objects.c
> > +++ b/builtin-unpack-objects.c
> > @@ -173,7 +250,6 @@ static void resolve_delta(unsigned nr, enum object_type type,
> > die("failed to apply delta");
> > free(delta);
> > write_object(nr, type, result, result_size);
> > - free(result);
> > }
>
> And this is freed later elsewhere?
The free is the task of write_object. If (!strict), then it is freeed
immediatly. Similar for blobs. Other objects are put in the cache (see
patch 1).
> > @@ -203,7 +279,8 @@ static void unpack_non_delta_entry(enum object_type type, unsigned long size,
> >
> > if (!dry_run && buf)
> > write_object(nr, type, buf, size);
> > - free(buf);
> > + else if (buf)
> > + free(buf);
> > }
>
> You can always free NULL without checking.
Will fix.
> > @@ -356,6 +434,7 @@ static void unpack_all(void)
> > if (!quiet)
> > progress = start_progress("Unpacking objects", nr_objects);
> > obj_list = xmalloc(nr_objects * sizeof(*obj_list));
> > + memset(obj_list, 0, nr_objects * sizeof(*obj_list));
>
> Hmm, is this a fix to the 'master' independent from all the rest
> of your patches, or a new requirement?
This is, because the new field in obj_list must be initialized with
zero.
mfg martin Kögler
prev parent reply other threads:[~2008-02-12 7:21 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre
2008-02-12 21:38 ` Martin Koegler
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler
2008-02-12 7:20 ` Martin Koegler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080212072030.GA20857@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).