From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Nicolas Pitre <nico@cam.org>, Junio C Hamano <gitster@pobox.com>,
git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Fri, 15 Feb 2008 08:18:00 +0100 [thread overview]
Message-ID: <20080215071800.GA27772@auto.tuwien.ac.at> (raw)
In-Reply-To: <alpine.LSU.1.00.0802142345450.30505@racer.site>
On Fri, Feb 15, 2008 at 12:06:49AM +0000, Johannes Schindelin wrote:
> On Thu, 14 Feb 2008, Martin Koegler wrote:
> > @@ -552,8 +553,10 @@ static struct commit_list *merge_bases(struct commit *one, struct commit *two)
> > */
> > return commit_list_insert(one, &result);
> >
> > - parse_commit(one);
> > - parse_commit(two);
> > + if (parse_commit(one))
> > + die("invalid commit");
> > + if (parse_commit(two))
> > + die("invalid commit");
>
> I'd rather have this return NULL after displaying an error. After all,
> merge_bases() is sort of "libified".
Would all caller of get_merge_bases treat NULL as an error?
> BTW a simple "git grep parse_commit\( | grep -vwe if -e gitweb" shows
> this:
>
> builtin-blame.c: parse_commit(commit);
> builtin-checkout.c: parse_commit(commit);
> builtin-checkout.c: parse_commit(old->commit);
> builtin-checkout.c: parse_commit(new->commit);
> builtin-checkout.c: parse_commit(new.commit);
> builtin-describe.c: parse_commit(p);
> builtin-describe.c: parse_commit(p);
> builtin-fast-export.c: parse_commit(commit);
> builtin-fast-export.c: parse_commit(commit->parents->item);
> builtin-fetch-pack.c: parse_commit(commit);
> builtin-fetch-pack.c: parse_commit(commit);
> builtin-fetch-pack.c: parse_commit(commit);
> builtin-name-rev.c: parse_commit(commit);
> builtin-show-branch.c: parse_commit(p);
> builtin-show-branch.c: parse_commit(commit);
> commit.c:int parse_commit(struct commit *item)
> commit.c: parse_commit(commit);
> commit.c: parse_commit(one);
> commit.c: parse_commit(two);
> commit.c: parse_commit(p);
> commit.h:int parse_commit(struct commit *item);
> contrib/gitview/gitview: self.parse_commit(commit_lines)
> contrib/gitview/gitview: def parse_commit(self, commit_lines):
> shallow.c: parse_commit(commit);
> upload-pack.c: parse_commit((struct commit *)object);
>
> A few of those need checking, too, I think.
Same for prepare_revision_walk. It can fail because an error in parse_XXX,
but no caller checks the result.
> > diff --git a/revision.c b/revision.c
> > index 6e85aaa..9f8723d 100644
> > --- a/revision.c
> > +++ b/revision.c
> > @@ -46,6 +46,8 @@ void add_object(struct object *obj,
> >
> > static void mark_blob_uninteresting(struct blob *blob)
> > {
> > + if (!blob)
> > + return;
>
> IMHO not needed [*1*]. The first user of this (static) function calls it
> with lookup_blob(), which assumes that the blob has been read already.
What about calling handle_revision_arg with "^<tree-sha1>", where a
non blob object is stored under a blob mode?
handle_revision_arg will parse only the tree and put it on the pending
list. prepare_revision_walk will call mark_tree_uninteresting on it.
For any object with blob mode in the tree, it calls
mark_blob_uninteresting(lookup_blob(entry.sha1)); If entry.sha1 is the
sha1 of an already loaded non blob object, we need this check.
> [*1*] There might be a few people arguing that defensive programming is
> never wrong.
>
> Alas, I saw my share of defensive programming, and more often than not,
> the real bugs were hard to find in between all those checks. And some
> checks were actively wrong -- again hard to see...
>
> Remember: You can make code so simple that there are obviously no bugs.
> And the other way is to make it so complicated that there are no obvious
> bugs.
I would put rev_info in the second category. What can not be done with
it?
> > @@ -57,6 +59,8 @@ void mark_tree_uninteresting(struct tree *tree)
> > struct name_entry entry;
> > struct object *obj = &tree->object;
> >
> > + if (!obj)
> > + return;
>
> Same here.
Same argument (s/blob/tree/g).
> > @@ -94,6 +98,8 @@ void mark_parents_uninteresting(struct commit *commit)
> >
> > while (parents) {
> > struct commit *commit = parents->item;
> > + if (!commit)
> > + continue;
>
> Can parents->item really be NULL? I think not. And indeed, a little
> search reveals that parse_commit_buffer() does this when it constructs the
> parents list, and encounters an invalid SHA-1:
>
> return error("bad parents in commit %s", sha1_to_hex(item->object.sha1));
>
> In case it is a valid SHA-1, but not a commit, it is ignored.
Its unnecessary, I missed this check.
> > @@ -173,6 +179,8 @@ static struct commit *handle_commit(struct rev_info *revs, struct object *object
> > struct tag *tag = (struct tag *) object;
> > if (revs->tag_objects && !(flags & UNINTERESTING))
> > add_pending_object(revs, object, tag->tag);
> > + if (!tag->tagged)
> > + die("bad tag");
>
> I haven't looked yet, but I suspect that this is as impossible as the
> invalid parent.
Eg. tag pointing to an blob, but when calling parse_tag_buffer, a non
blob object is loaded under blob sha1.
But wouldn't it be simpler, if we would add
if (!item->tagged)
return -1;
in parse_tag_buffer?
> > diff --git a/sha1_file.c b/sha1_file.c
> > index 4179949..c25ce64 100644
> > --- a/sha1_file.c
> > +++ b/sha1_file.c
> > @@ -1943,7 +1943,8 @@ void *read_object_with_reference(const unsigned char *sha1,
> > }
> > ref_length = strlen(ref_type);
> >
> > - if (memcmp(buffer, ref_type, ref_length) ||
> > + if (ref_length + 40 >= size ||
> > + memcmp(buffer, ref_type, ref_length) ||
>
> Makes sense.
>
> > @@ -494,6 +498,8 @@ static int peel_onion(const char *name, int len, unsigned char *sha1)
> > return error("%.*s: expected %s type, but the object dereferences to %s type",
> > len, name, typename(expected_type),
> > typename(o->type));
> > + if (!o)
> > + return -1;
>
> You probably want to guard for ((tag *)o)->tagged == NULL; Okay, now I am
> curious. *megoesandlooks* Indeed, if the tag refers to an unknown type,
> tagged = NULL.
>
> But I think in that case, it should return an error(). And maybe only for
> type == OBJ_TAG.
>
> > @@ -580,6 +586,8 @@ static int handle_one_ref(const char *path,
> > return 0;
> > if (object->type == OBJ_TAG)
> > object = deref_tag(object, path, strlen(path));
> > + if (!object)
> > + return 0;
>
> As above, it looks strange to see that object->something is checked, and
> _then_ object. I'd put this into curly brackets, together with the
> deref_tag(), just to help the reader a bit, should she be as weak in mind
> as yours truly.
>
> > @@ -617,7 +625,8 @@ static int get_sha1_oneline(const char *prefix, unsigned char *sha1)
> > unsigned long size;
> >
> > commit = pop_most_recent_commit(&list, ONELINE_SEEN);
> > - parse_object(commit->object.sha1);
> > + if(!parse_object(commit->object.sha1))
> > + continue;
>
> Makes sense, but please add a space after the "if".
>
> > diff --git a/tag.c b/tag.c
> > index 38bf913..990134f 100644
> > --- a/tag.c
> > +++ b/tag.c
> > @@ -9,7 +9,10 @@ const char *tag_type = "tag";
> > struct object *deref_tag(struct object *o, const char *warn, int warnlen)
> > {
> > while (o && o->type == OBJ_TAG)
> > - o = parse_object(((struct tag *)o)->tagged->sha1);
> > + if (((struct tag *)o)->tagged)
> > + o = parse_object(((struct tag *)o)->tagged->sha1);
> > + else
> > + o = NULL;
>
> Knowing that tagged _can_ be NULL, this makes tons of sense. Except that
> again, I'd call error() to tell the user, before setting o = NULL.
mfg Martin Kögler
next prev parent reply other threads:[~2008-02-15 7:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre
2008-02-12 21:38 ` Martin Koegler
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler [this message]
2008-02-12 7:20 ` Martin Koegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080215071800.GA27772@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).