From: Jeff King <peff@peff.net>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH resend] Do not create commits whose message contains NUL
Date: Tue, 13 Dec 2011 12:59:32 -0500 [thread overview]
Message-ID: <20111213175932.GA1663@sigill.intra.peff.net> (raw)
In-Reply-To: <1323777368-19697-1-git-send-email-pclouds@gmail.com>
On Tue, Dec 13, 2011 at 06:56:08PM +0700, Nguyen Thai Ngoc Duy wrote:
> We assume that the commit log messages are uninterpreted sequences of
> non-NUL bytes (see Documentation/i18n.txt). However the assumption
> does not really stand out and it's quite easy to set an editor to save
> in a NUL-included encoding. Currently we silently cut at the first NUL
> we see.
>
> Make it more obvious that NUL is not welcome by refusing to create
> such commits. Those who deliberately want to create them can still do
> with hash-object.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> This is from UTF-16 in commit message discussion [1] a few months
> ago. I don't want to resurrect the discussion again. However I think
> it's a good idea to stop users from shooting themselves in this case,
> especially at porcelain level.
>
> There were no comments on this patch previously. So, any comments
> this time ? Should I drop it?
I think this is a sane thing to do. Having thought about and
experimented a little with utf-16 in the past few months, I really don't
see how you could be disrupting anybody's workflow. utf-16 messages get
butchered so badly already; we are much better off letting the user know
of the problem as soon as possible.
It looks like we already have a check for is_utf8, and this is not
failing that check. I guess because is_utf8 takes a NUL-terminated
buffer, so it simply sees the truncated result (i.e., depending on
endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check
only "f"). We could make is_utf8 take a length parameter to be more
accurate, and then it would catch this.
However, I think that's not quite what we want. We only check is_utf8 if
the encoding field is not set. And really, we want to reject NULs no
matter _which_ encoding they've set, because git simply doesn't handle
them properly.
> diff --git a/commit.c b/commit.c
> index d67b8c7..0775eec 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
Hmm. My version of commit_tree does not have a "msg_len" parameter, nor
do I have d67b8c7. Is there some refactoring patch this is based on that
I missed?
> + if (strlen(msg) < msg_len)
> + die(_("cannot commit with NUL in commit message"));
> +
Two nits:
1. For some reason, checking strlen(msg) seems a subtle way of looking
for NULs in a buffer. I would have found:
if (memchr(msg, '\0', msglen))
much more obvious. But perhaps it is just me. Certainly not a big
deal either way.
2. The error message could be a little friendlier. The likely reason
for NULs is a bogus encoding setting in the user's editor. We
already have a nice "your message isn't utf-8" message. Though it
does suggest setting i18n.commitencoding, which probably _isn't_
the solution here (since their encoding clearly isn't supported).
But maybe it would be nicer to say something like:
error: your commit message contains NUL characters.
hint: This is often caused by using multibyte encodings such as
hint: UTF-16. Please check your editor settings.
We could even go further and detect some common NUL-containing
encodings, but I don't think it's worth the effort.
> diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
> new file mode 100644
> index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb
> GIT binary patch
> literal 32
> mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp<
I was disappointed not to find a secret message. :)
-Peff
next prev parent reply other threads:[~2011-12-13 17:59 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King [this message]
2011-12-14 5:23 ` Miles Bader
2011-12-14 7:17 ` Jeff King
2012-01-01 16:27 ` Drew Northup
2012-01-03 20:03 ` Jeff King
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 18:12 ` Junio C Hamano
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
2011-12-14 18:13 ` Junio C Hamano
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-14 18:19 ` Junio C Hamano
2011-12-14 18:29 ` Jeff King
2011-12-15 18:46 ` Junio C Hamano
2011-12-15 19:35 ` Junio C Hamano
2011-12-15 1:04 ` Miles Bader
2011-12-15 1:18 ` Jeff King
2011-12-15 3:09 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111213175932.GA1663@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.