git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] tree-walk: be more specific about corrupt tree errors
Date: Thu, 13 Feb 2014 09:25:40 +0100	[thread overview]
Message-ID: <52FC8184.5090606@alum.mit.edu> (raw)
In-Reply-To: <20140212194956.GA3850@sigill.intra.peff.net>

On 02/12/2014 08:49 PM, Jeff King wrote:
> When the tree-walker runs into an error, it just calls
> die(), and the message is always "corrupt tree file".
> However, we are actually covering several cases here; let's
> give the user a hint about what happened.
> 
> Let's also avoid using the word "corrupt", which makes it
> seem like the data bit-rotted on disk. Our sha1 check would
> already have found that. These errors are ones of data that
> is malformed in the first place.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Michael and I have been looking off-list at some bogus trees (created by
> a non-git.git implementation). git-fsck unhelpfully dies during the
> tree-walk:
> 
>   $ git fsck
>   Checking object directories: 100% (256/256), done.
>   fatal: corrupt tree file
> 
> I think in the long run we want to either teach fsck to avoid the
> regular tree-walker or to set a special "continue as much as you can"
> flag. That will let us keep going to find more errors, do our usual fsck
> error checks (which are more strict), and especially report on _which_
> object was broken (we can't do that here because we are deep in the call
> stack and may not even have a real object yet).
> 
> This patch at least gives us slightly more specific error messages (both
> for fsck and for other commands). And it may provide a first step in
> clarity if we follow the "continue as much as you can" path.
> 
>  tree-walk.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/tree-walk.c b/tree-walk.c
> index 79dba1d..d53b084 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -28,11 +28,13 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned
>  	unsigned int mode, len;
>  
>  	if (size < 24 || buf[size - 21])
> -		die("corrupt tree file");
> +		die("truncated tree file");
>  

I suggest splitting this into two separate checks, because the first
boolean definitely indicates a truncated file, whereas the second
boolean could indicate malformedness that is not caused by truncation.

Another tiny point: I suppose that the number "24" comes from

    A one-digit mode
    SP
    A one-character filename
    NUL
    20-byte SHA1

But given that you are detecting zero-length filenames a few lines
later, I think it makes logical sense to admit for that possibility
here, by reducing 24 -> 23.  (I realize that it doesn't change the end
result, because the only syntactically correct situation with length=23
would be a doubly-broken line that has a one-digit mode *and* a
zero-length filename, and it's arbitrary which of the forms of
brokenness we report in such a case.)

>  	path = get_mode(buf, &mode);
> -	if (!path || !*path)
> -		die("corrupt tree file");
> +	if (!path)
> +		die("malformed mode in tree entry");
> +	if (!*path)
> +		die("empty filename in tree entry");
>  	len = strlen(path) + 1;
>  
>  	/* Initialize the descriptor entry */
> @@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc)
>  	unsigned long len = end - (const unsigned char *)buf;
>  
>  	if (size < len)
> -		die("corrupt tree file");
> +		die("truncated tree file");
>  	buf = end;
>  	size -= len;
>  	desc->buffer = buf;
> 

Otherwise, I think this is a nice improvement.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

      reply	other threads:[~2014-02-13  8:25 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12 19:49 [PATCH] tree-walk: be more specific about corrupt tree errors Jeff King
2014-02-13  8:25 ` Michael Haggerty [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52FC8184.5090606@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).