* [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
@ 2008-01-13 18:11 Martin Koegler
2008-01-13 18:11 ` [PATCH 2/2] git-fsck: remove commit test already done by parse_commit_buffer Martin Koegler
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Martin Koegler @ 2008-01-13 18:11 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Martin Koegler
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
---
commit.c | 35 ++++++++++++++++++++++++-----------
1 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/commit.c b/commit.c
index f074811..5b52daa 100644
--- a/commit.c
+++ b/commit.c
@@ -48,22 +48,34 @@ struct commit *lookup_commit(const unsigned char *sha1)
return check_commit(obj, sha1, 0);
}
-static unsigned long parse_commit_date(const char *buf)
+static int parse_commit_date(const char *buf, const char *tail, unsigned long *date)
{
- unsigned long date;
+ const char *dateptr;
+ if (buf + 6 >= tail)
+ return 0;
if (memcmp(buf, "author", 6))
return 0;
- while (*buf++ != '\n')
+ while (buf < tail && *buf++ != '\n')
/* nada */;
+ if (buf + 9 >= tail)
+ return 0;
if (memcmp(buf, "committer", 9))
return 0;
- while (*buf++ != '>')
+ while (buf < tail && *buf++ != '>')
/* nada */;
- date = strtoul(buf, NULL, 10);
- if (date == ULONG_MAX)
- date = 0;
- return date;
+ if (buf >= tail)
+ return 0;
+ dateptr = buf;
+ while (buf < tail && *buf++ != '\n')
+ /* nada */;
+ if (buf >= tail)
+ return 0;
+ /* dateptr < buf && buf[-1] == '\n', so strtoul will stop at buf-1 */
+ *date = strtoul(dateptr, NULL, 10);
+ if (*date == ULONG_MAX)
+ *date = 0;
+ return 1;
}
static struct commit_graft **commit_graft;
@@ -236,9 +248,9 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
return 0;
item->object.parsed = 1;
tail += size;
- if (tail <= bufptr + 5 || memcmp(bufptr, "tree ", 5))
+ if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != '\n')
return error("bogus commit object %s", sha1_to_hex(item->object.sha1));
- if (tail <= bufptr + 45 || get_sha1_hex(bufptr + 5, parent) < 0)
+ if (get_sha1_hex(bufptr + 5, parent) < 0)
return error("bad tree pointer in commit %s",
sha1_to_hex(item->object.sha1));
item->tree = lookup_tree(parent);
@@ -275,7 +287,8 @@ int parse_commit_buffer(struct commit *item, void *buffer, unsigned long size)
n_refs++;
}
}
- item->date = parse_commit_date(bufptr);
+ if (!parse_commit_date(bufptr, tail, &item->date))
+ return error("bogus commit date in object %s", sha1_to_hex(item->object.sha1));
if (track_object_refs) {
unsigned i = 0;
--
1.4.4.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] git-fsck: remove commit test already done by parse_commit_buffer
2008-01-13 18:11 [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Martin Koegler
@ 2008-01-13 18:11 ` Martin Koegler
2008-01-13 23:23 ` [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Junio C Hamano
2008-01-14 7:23 ` Junio C Hamano
2 siblings, 0 replies; 7+ messages in thread
From: Martin Koegler @ 2008-01-13 18:11 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Martin Koegler
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
---
builtin-fsck.c | 12 ------------
1 files changed, 0 insertions(+), 12 deletions(-)
diff --git a/builtin-fsck.c b/builtin-fsck.c
index e4874f6..a77d8c0 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -360,18 +360,6 @@ static int fsck_commit(struct commit *commit)
fprintf(stderr, "Checking commit %s\n",
sha1_to_hex(commit->object.sha1));
- if (memcmp(buffer, "tree ", 5))
- return objerror(&commit->object, "invalid format - expected 'tree' line");
- if (get_sha1_hex(buffer+5, tree_sha1) || buffer[45] != '\n')
- return objerror(&commit->object, "invalid 'tree' line format - bad sha1");
- buffer += 46;
- while (!memcmp(buffer, "parent ", 7)) {
- if (get_sha1_hex(buffer+7, sha1) || buffer[47] != '\n')
- return objerror(&commit->object, "invalid 'parent' line format - bad sha1");
- buffer += 48;
- }
- if (memcmp(buffer, "author ", 7))
- return objerror(&commit->object, "invalid format - expected 'author' line");
free(commit->buffer);
commit->buffer = NULL;
if (!commit->tree)
--
1.4.4.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
2008-01-13 18:11 [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Martin Koegler
2008-01-13 18:11 ` [PATCH 2/2] git-fsck: remove commit test already done by parse_commit_buffer Martin Koegler
@ 2008-01-13 23:23 ` Junio C Hamano
2008-01-14 6:46 ` Martin Koegler
2008-01-14 7:23 ` Junio C Hamano
2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-13 23:23 UTC (permalink / raw)
To: Martin Koegler; +Cc: git
I think I saw this patch and vaguely recall commenting on it,
but your message is not very helpful locating the previous
thread to see what kind of improvements are made to this new
round.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
2008-01-13 18:11 [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Martin Koegler
2008-01-13 18:11 ` [PATCH 2/2] git-fsck: remove commit test already done by parse_commit_buffer Martin Koegler
2008-01-13 23:23 ` [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Junio C Hamano
@ 2008-01-14 7:23 ` Junio C Hamano
2008-01-14 20:58 ` Martin Koegler
2 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2008-01-14 7:23 UTC (permalink / raw)
To: Martin Koegler; +Cc: git
Martin Koegler <mkoegler@auto.tuwien.ac.at> writes:
> + if (!parse_commit_date(bufptr, tail, &item->date))
> + return error("bogus commit date in object %s", sha1_to_hex(item->object.sha1));
>
> if (track_object_refs) {
> unsigned i = 0;
I suspect this might be an undesirable regression.
If somebody managed to create a commit with a bogus "author"
line and wanted to clean up the history, your previous one at
least gave something usable back, even though it had to come up
with a bogus date. It gave the rest of the data back without
barfing. And it was easy to see which "resurrected" commit had
a missing author date (bogus ones always gave 0 timestamp).
This round you made it to error out, and callers that check the
return value of parse_commit() would stop traversing the
history, even if the commit in question has perfectly valid
"parent " lines, thinking "ah, this commit object is faulty".
It actively interferes with attempts to resurrect data from
history that contains a faulty commit.
Your previous version was much better with respect to this
issue. It was about being more careful not to read outside the
commit object buffer, while still allowing the data from a
history that has an unfortunate commit with broken author line
to be resurrected more easily.
I do not think the checks done by fsck and parse_commit should
share the same strictness. They serve different purposes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
2008-01-14 7:23 ` Junio C Hamano
@ 2008-01-14 20:58 ` Martin Koegler
2008-01-14 21:42 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Martin Koegler @ 2008-01-14 20:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sun, Jan 13, 2008 at 11:23:40PM -0800, Junio C Hamano wrote:
> Martin Koegler <mkoegler@auto.tuwien.ac.at> writes:
>
> > + if (!parse_commit_date(bufptr, tail, &item->date))
> > + return error("bogus commit date in object %s", sha1_to_hex(item->object.sha1));
> >
> > if (track_object_refs) {
> > unsigned i = 0;
>
> I suspect this might be an undesirable regression.
You seem to have missed my reply to your last mail:
http://marc.info/?l=git&m=119969163624138&w=2
I asked you, what you would think about this change, but got no
answer. Anyway, now I know your opinion ;-)
> If somebody managed to create a commit with a bogus "author"
> line and wanted to clean up the history, your previous one at
> least gave something usable back, even though it had to come up
> with a bogus date. It gave the rest of the data back without
> barfing. And it was easy to see which "resurrected" commit had
> a missing author date (bogus ones always gave 0 timestamp).
>
> This round you made it to error out, and callers that check the
> return value of parse_commit() would stop traversing the
> history, even if the commit in question has perfectly valid
> "parent " lines, thinking "ah, this commit object is faulty".
> It actively interferes with attempts to resurrect data from
> history that contains a faulty commit.
On the other hand, it is possible, that somebody pushed such a commit
out, if he does not notice it. Then its difficult to get rid of the
broken commit. [Hasn't happend a broken commit on pu recently?]
parse_commit_date is not very strict, so its likely, that it miss such
a commit. Commit parsing is too common function to slow it down with
further checks.
> Your previous version was much better with respect to this
> issue. It was about being more careful not to read outside the
> commit object buffer, while still allowing the data from a
> history that has an unfortunate commit with broken author line
> to be resurrected more easily.
I'll repost a version, which reverts this change.
> I do not think the checks done by fsck and parse_commit should
> share the same strictness. They serve different purposes.
Maybe I can improve fsck.
mfg Martin Kögler
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] parse_commit_buffer: don't parse invalid commits
2008-01-14 20:58 ` Martin Koegler
@ 2008-01-14 21:42 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2008-01-14 21:42 UTC (permalink / raw)
To: Martin Koegler; +Cc: git
mkoegler@auto.tuwien.ac.at (Martin Koegler) writes:
> You seem to have missed my reply to your last mail:
> http://marc.info/?l=git&m=119969163624138&w=2
Yeah, apparently I did.
> On the other hand, it is possible, that somebody pushed such a commit
> out, if he does not notice it. Then its difficult to get rid of the
> broken commit. [Hasn't happend a broken commit on pu recently?]
That's true, but I wonder if there is a reasonable middle
ground. For example, we could keep the warning you added (by
calling error() to identify which commit object is suspicious)
without making the whole function return that error. Then a
broken commit could have been catched while reviewing the series
before the commit gets pushed out.
>> I do not think the checks done by fsck and parse_commit should
>> share the same strictness. They serve different purposes.
>
> Maybe I can improve fsck.
Thanks. It should notice bogus commits but it should allow
checking objects referenced by them unless the particular
bogosity is severe enough to make "tree " and "parent " lines in
them invalid.
IOW, the information on the "author" line is not that important.
It has zero importance from the structural viewpoint, just like
the commit log message contents does not have any structural
importance (it does not even affect how the log output is sorted
at all, while the timestamp on the "committer" line has some say
in it).
Noticing a bogus "author" or "committer" line is like noticing
that your commit log message has garbage characters in it.
Maybe a future option to fsck might want to check that all log
messages are encoded in UTF-8, and you may catch some that are
in ISO-8859-1. After flagging them as such, the program should
still continue digging the history down to check its ancestors,
as such a breakage does not affect the structural integrity of
the history.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-01-14 21:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-13 18:11 [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Martin Koegler
2008-01-13 18:11 ` [PATCH 2/2] git-fsck: remove commit test already done by parse_commit_buffer Martin Koegler
2008-01-13 23:23 ` [PATCH 1/2] parse_commit_buffer: don't parse invalid commits Junio C Hamano
2008-01-14 6:46 ` Martin Koegler
2008-01-14 7:23 ` Junio C Hamano
2008-01-14 20:58 ` Martin Koegler
2008-01-14 21:42 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).