git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH resend] Do not create commits whose message contains NUL
Date: Tue, 13 Dec 2011 12:59:32 -0500	[thread overview]
Message-ID: <20111213175932.GA1663@sigill.intra.peff.net> (raw)
In-Reply-To: <1323777368-19697-1-git-send-email-pclouds@gmail.com>

On Tue, Dec 13, 2011 at 06:56:08PM +0700, Nguyen Thai Ngoc Duy wrote:

> We assume that the commit log messages are uninterpreted sequences of
> non-NUL bytes (see Documentation/i18n.txt). However the assumption
> does not really stand out and it's quite easy to set an editor to save
> in a NUL-included encoding. Currently we silently cut at the first NUL
> we see.
> 
> Make it more obvious that NUL is not welcome by refusing to create
> such commits. Those who deliberately want to create them can still do
> with hash-object.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  This is from UTF-16 in commit message discussion [1] a few months
>  ago. I don't want to resurrect the discussion again. However I think
>  it's a good idea to stop users from shooting themselves in this case,
>  especially at porcelain level.
>
>  There were no comments on this patch previously. So, any comments
>  this time ? Should I drop it?

I think this is a sane thing to do. Having thought about and
experimented a little with utf-16 in the past few months, I really don't
see how you could be disrupting anybody's workflow. utf-16 messages get
butchered so badly already; we are much better off letting the user know
of the problem as soon as possible.

It looks like we already have a check for is_utf8, and this is not
failing that check. I guess because is_utf8 takes a NUL-terminated
buffer, so it simply sees the truncated result (i.e., depending on
endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check
only "f"). We could make is_utf8 take a length parameter to be more
accurate, and then it would catch this.

However, I think that's not quite what we want. We only check is_utf8 if
the encoding field is not set. And really, we want to reject NULs no
matter _which_ encoding they've set, because git simply doesn't handle
them properly.

> diff --git a/commit.c b/commit.c
> index d67b8c7..0775eec 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,

Hmm. My version of commit_tree does not have a "msg_len" parameter, nor
do I have d67b8c7. Is there some refactoring patch this is based on that
I missed?

> +	if (strlen(msg) < msg_len)
> +		die(_("cannot commit with NUL in commit message"));
> +

Two nits:

  1. For some reason, checking strlen(msg) seems a subtle way of looking
     for NULs in a buffer. I would have found:

         if (memchr(msg, '\0', msglen))

     much more obvious. But perhaps it is just me. Certainly not a big
     deal either way.

  2. The error message could be a little friendlier. The likely reason
     for NULs is a bogus encoding setting in the user's editor. We
     already have a nice "your message isn't utf-8" message. Though it
     does suggest setting i18n.commitencoding, which probably _isn't_
     the solution here (since their encoding clearly isn't supported).
     But maybe it would be nicer to say something like:

       error: your commit message contains NUL characters.
       hint: This is often caused by using multibyte encodings such as
       hint: UTF-16. Please check your editor settings.

     We could even go further and detect some common NUL-containing
     encodings, but I don't think it's worth the effort.

> diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
> new file mode 100644
> index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb
> GIT binary patch
> literal 32
> mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp<

I was disappointed not to find a secret message. :)

-Peff

  reply	other threads:[~2011-12-13 17:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King [this message]
2011-12-14  5:23   ` Miles Bader
2011-12-14  7:17     ` Jeff King
2012-01-01 16:27   ` Drew Northup
2012-01-03 20:03     ` Jeff King
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08   ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 18:12     ` Junio C Hamano
2011-12-15 13:47     ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47       ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
2011-12-15 13:47       ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
2011-12-15 18:23       ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2011-12-14 14:08   ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
2011-12-14 18:13     ` Junio C Hamano
2011-12-14 14:08   ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-14 18:19     ` Junio C Hamano
2011-12-14 18:29       ` Jeff King
2011-12-15 18:46         ` Junio C Hamano
2011-12-15 19:35           ` Junio C Hamano
2011-12-15  1:04     ` Miles Bader
2011-12-15  1:18       ` Jeff King
2011-12-15  3:09         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111213175932.GA1663@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).