git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tree-walk: be more specific about corrupt tree errors
@ 2014-02-12 19:49 Jeff King
  2014-02-13  8:25 ` Michael Haggerty
  0 siblings, 1 reply; 2+ messages in thread
From: Jeff King @ 2014-02-12 19:49 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty

When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.

Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.

Signed-off-by: Jeff King <peff@peff.net>
---
Michael and I have been looking off-list at some bogus trees (created by
a non-git.git implementation). git-fsck unhelpfully dies during the
tree-walk:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  fatal: corrupt tree file

I think in the long run we want to either teach fsck to avoid the
regular tree-walker or to set a special "continue as much as you can"
flag. That will let us keep going to find more errors, do our usual fsck
error checks (which are more strict), and especially report on _which_
object was broken (we can't do that here because we are deep in the call
stack and may not even have a real object yet).

This patch at least gives us slightly more specific error messages (both
for fsck and for other commands). And it may provide a first step in
clarity if we follow the "continue as much as you can" path.

 tree-walk.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 79dba1d..d53b084 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -28,11 +28,13 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned
 	unsigned int mode, len;
 
 	if (size < 24 || buf[size - 21])
-		die("corrupt tree file");
+		die("truncated tree file");
 
 	path = get_mode(buf, &mode);
-	if (!path || !*path)
-		die("corrupt tree file");
+	if (!path)
+		die("malformed mode in tree entry");
+	if (!*path)
+		die("empty filename in tree entry");
 	len = strlen(path) + 1;
 
 	/* Initialize the descriptor entry */
@@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc)
 	unsigned long len = end - (const unsigned char *)buf;
 
 	if (size < len)
-		die("corrupt tree file");
+		die("truncated tree file");
 	buf = end;
 	size -= len;
 	desc->buffer = buf;
-- 
1.8.5.2.500.g8060133

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] tree-walk: be more specific about corrupt tree errors
  2014-02-12 19:49 [PATCH] tree-walk: be more specific about corrupt tree errors Jeff King
@ 2014-02-13  8:25 ` Michael Haggerty
  0 siblings, 0 replies; 2+ messages in thread
From: Michael Haggerty @ 2014-02-13  8:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On 02/12/2014 08:49 PM, Jeff King wrote:
> When the tree-walker runs into an error, it just calls
> die(), and the message is always "corrupt tree file".
> However, we are actually covering several cases here; let's
> give the user a hint about what happened.
> 
> Let's also avoid using the word "corrupt", which makes it
> seem like the data bit-rotted on disk. Our sha1 check would
> already have found that. These errors are ones of data that
> is malformed in the first place.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Michael and I have been looking off-list at some bogus trees (created by
> a non-git.git implementation). git-fsck unhelpfully dies during the
> tree-walk:
> 
>   $ git fsck
>   Checking object directories: 100% (256/256), done.
>   fatal: corrupt tree file
> 
> I think in the long run we want to either teach fsck to avoid the
> regular tree-walker or to set a special "continue as much as you can"
> flag. That will let us keep going to find more errors, do our usual fsck
> error checks (which are more strict), and especially report on _which_
> object was broken (we can't do that here because we are deep in the call
> stack and may not even have a real object yet).
> 
> This patch at least gives us slightly more specific error messages (both
> for fsck and for other commands). And it may provide a first step in
> clarity if we follow the "continue as much as you can" path.
> 
>  tree-walk.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/tree-walk.c b/tree-walk.c
> index 79dba1d..d53b084 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -28,11 +28,13 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned
>  	unsigned int mode, len;
>  
>  	if (size < 24 || buf[size - 21])
> -		die("corrupt tree file");
> +		die("truncated tree file");
>  

I suggest splitting this into two separate checks, because the first
boolean definitely indicates a truncated file, whereas the second
boolean could indicate malformedness that is not caused by truncation.

Another tiny point: I suppose that the number "24" comes from

    A one-digit mode
    SP
    A one-character filename
    NUL
    20-byte SHA1

But given that you are detecting zero-length filenames a few lines
later, I think it makes logical sense to admit for that possibility
here, by reducing 24 -> 23.  (I realize that it doesn't change the end
result, because the only syntactically correct situation with length=23
would be a doubly-broken line that has a one-digit mode *and* a
zero-length filename, and it's arbitrary which of the forms of
brokenness we report in such a case.)

>  	path = get_mode(buf, &mode);
> -	if (!path || !*path)
> -		die("corrupt tree file");
> +	if (!path)
> +		die("malformed mode in tree entry");
> +	if (!*path)
> +		die("empty filename in tree entry");
>  	len = strlen(path) + 1;
>  
>  	/* Initialize the descriptor entry */
> @@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc)
>  	unsigned long len = end - (const unsigned char *)buf;
>  
>  	if (size < len)
> -		die("corrupt tree file");
> +		die("truncated tree file");
>  	buf = end;
>  	size -= len;
>  	desc->buffer = buf;
> 

Otherwise, I think this is a nice improvement.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-02-13  8:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-12 19:49 [PATCH] tree-walk: be more specific about corrupt tree errors Jeff King
2014-02-13  8:25 ` Michael Haggerty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).