git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 2/2] fsck: detect and warn a commit with embedded NUL
Date: Thu, 14 Apr 2016 14:21:03 -0400	[thread overview]
Message-ID: <20160414182102.GB22068@sigill.intra.peff.net> (raw)
In-Reply-To: <20160414180709.28968-2-gitster@pobox.com>

On Thu, Apr 14, 2016 at 11:07:09AM -0700, Junio C Hamano wrote:

> Even though a Git commit object is designed to be capable of storing
> any binary data as its payload, in practice people use it to describe
> the changes in textual form, and tools like "git log" are designed to
> treat the payload as text.
> 
> Detect and warn when we see any commit object with a NUL byte in
> it.
> 
> Note that a NUL byte in the header part is already detected as a
> grave error.  This change is purely about the message part.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Thanks, I was just reading over some of the old threads, and wondering
if it was time to resurrect this idea.

> @@ -610,6 +611,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>  	struct commit_graft *graft;
>  	unsigned parent_count, parent_line_count = 0, author_count;
>  	int err;
> +	const char *buffer_begin = buffer;
>  
>  	if (verify_headers(buffer, size, &commit->object, options))
>  		return -1;

You need this "buffer_begin" because we move the "buffer" pointer
forward as we parse. But perhaps whole-buffer checks should simply go at
the top (next to verify_headers) before we start advancing the pointer.
To me, that makes the function's flow more natural.

But alternatively...

> @@ -671,6 +673,12 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>  		if (err)
>  			return err;
>  	}
> +	if (memchr(buffer_begin, '\0', size)) {
> +		err = report(options, &commit->object, FSCK_MSG_NUL_IN_COMMIT,
> +			     "NUL byte in the commit object body");
> +		if (err)
> +			return err;
> +	}

Here we've parsed to the end of the headers we know about. We know
there's no NUL there, because verify_headers() would have complained.
And because the individual header parsers would have complained. So I
actually think we could check from "buffer" (of course we do still need
to record the beginning of the buffer to adjust "size" appropriately).

It's a little more efficient (we don't have to memchr over the same
bytes again). But I'd worry a little that doing it that way would
introduce coupling between this check and verify_headers(), though (so
that if the latter ever changes, our check may start missing cases).

So yet another alternative would be to include this check in
verify_headers(). It would parse to the end of the headers as now, and
then from there additionally look for a NUL in the body.

Of the three approaches, I think I like that third one. It's the most
efficient, and I think the flow is pretty clear. We'd probably want to
rename verify_headers(), though. :)

-Peff

  reply	other threads:[~2016-04-14 18:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 18:07 [PATCH 1/2] fsck_commit_buffer(): do not special case the last validation Junio C Hamano
2016-04-14 18:07 ` [PATCH 2/2] fsck: detect and warn a commit with embedded NUL Junio C Hamano
2016-04-14 18:21   ` Jeff King [this message]
2016-04-14 18:25     ` Junio C Hamano
2016-04-14 18:29       ` Jeff King
2016-04-14 19:04         ` Junio C Hamano
2016-04-14 18:25     ` Jeff King
2016-04-14 18:37       ` Junio C Hamano
2016-04-15 13:43   ` Johannes Schindelin
2016-04-14 18:10 ` [PATCH 1/2] fsck_commit_buffer(): do not special case the last validation Jeff King
2016-04-14 18:15   ` Junio C Hamano
2016-04-15 13:41 ` Johannes Schindelin
2016-04-15 15:06   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160414182102.GB22068@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).