* [PATCH resend] Do not create commits whose message contains NUL @ 2011-12-13 11:56 Nguyễn Thái Ngọc Duy 2011-12-13 17:59 ` Jeff King 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy 0 siblings, 2 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-13 11:56 UTC (permalink / raw) To: git; +Cc: Nguyễn Thái Ngọc Duy We assume that the commit log messages are uninterpreted sequences of non-NUL bytes (see Documentation/i18n.txt). However the assumption does not really stand out and it's quite easy to set an editor to save in a NUL-included encoding. Currently we silently cut at the first NUL we see. Make it more obvious that NUL is not welcome by refusing to create such commits. Those who deliberately want to create them can still do with hash-object. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- This is from UTF-16 in commit message discussion [1] a few months ago. I don't want to resurrect the discussion again. However I think it's a good idea to stop users from shooting themselves in this case, especially at porcelain level. There were no comments on this patch previously. So, any comments this time ? Should I drop it? [1] http://thread.gmane.org/gmane.comp.version-control.git/184123/focus=184335 commit.c | 3 +++ t/t3900-i18n-commit.sh | 6 ++++++ t/t3900/UTF-16.txt | Bin 0 -> 32 bytes 3 files changed, 9 insertions(+), 0 deletions(-) create mode 100644 t/t3900/UTF-16.txt diff --git a/commit.c b/commit.c index d67b8c7..0775eec 100644 --- a/commit.c +++ b/commit.c @@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree, assert_sha1_type(tree, OBJ_TREE); + if (strlen(msg) < msg_len) + die(_("cannot commit with NUL in commit message")); + /* Not having i18n.commitencoding is the same as having utf-8 */ encoding_is_utf8 = is_encoding_utf8(git_commit_encoding); diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh index 1f62c15..d48a7c0 100755 --- a/t/t3900-i18n-commit.sh +++ b/t/t3900-i18n-commit.sh @@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' ' test z = "z$E" ' +test_expect_failure 'UTF-16 refused because of NULs' ' + echo UTF-16 >F && + git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt +' + + for H in ISO8859-1 eucJP ISO-2022-JP do test_expect_success "$H setup" ' diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt new file mode 100644 index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb GIT binary patch literal 32 mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp< literal 0 HcmV?d00001 -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL 2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy @ 2011-12-13 17:59 ` Jeff King 2011-12-14 5:23 ` Miles Bader 2012-01-01 16:27 ` Drew Northup 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy 1 sibling, 2 replies; 23+ messages in thread From: Jeff King @ 2011-12-13 17:59 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git On Tue, Dec 13, 2011 at 06:56:08PM +0700, Nguyen Thai Ngoc Duy wrote: > We assume that the commit log messages are uninterpreted sequences of > non-NUL bytes (see Documentation/i18n.txt). However the assumption > does not really stand out and it's quite easy to set an editor to save > in a NUL-included encoding. Currently we silently cut at the first NUL > we see. > > Make it more obvious that NUL is not welcome by refusing to create > such commits. Those who deliberately want to create them can still do > with hash-object. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > This is from UTF-16 in commit message discussion [1] a few months > ago. I don't want to resurrect the discussion again. However I think > it's a good idea to stop users from shooting themselves in this case, > especially at porcelain level. > > There were no comments on this patch previously. So, any comments > this time ? Should I drop it? I think this is a sane thing to do. Having thought about and experimented a little with utf-16 in the past few months, I really don't see how you could be disrupting anybody's workflow. utf-16 messages get butchered so badly already; we are much better off letting the user know of the problem as soon as possible. It looks like we already have a check for is_utf8, and this is not failing that check. I guess because is_utf8 takes a NUL-terminated buffer, so it simply sees the truncated result (i.e., depending on endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check only "f"). We could make is_utf8 take a length parameter to be more accurate, and then it would catch this. However, I think that's not quite what we want. We only check is_utf8 if the encoding field is not set. And really, we want to reject NULs no matter _which_ encoding they've set, because git simply doesn't handle them properly. > diff --git a/commit.c b/commit.c > index d67b8c7..0775eec 100644 > --- a/commit.c > +++ b/commit.c > @@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree, Hmm. My version of commit_tree does not have a "msg_len" parameter, nor do I have d67b8c7. Is there some refactoring patch this is based on that I missed? > + if (strlen(msg) < msg_len) > + die(_("cannot commit with NUL in commit message")); > + Two nits: 1. For some reason, checking strlen(msg) seems a subtle way of looking for NULs in a buffer. I would have found: if (memchr(msg, '\0', msglen)) much more obvious. But perhaps it is just me. Certainly not a big deal either way. 2. The error message could be a little friendlier. The likely reason for NULs is a bogus encoding setting in the user's editor. We already have a nice "your message isn't utf-8" message. Though it does suggest setting i18n.commitencoding, which probably _isn't_ the solution here (since their encoding clearly isn't supported). But maybe it would be nicer to say something like: error: your commit message contains NUL characters. hint: This is often caused by using multibyte encodings such as hint: UTF-16. Please check your editor settings. We could even go further and detect some common NUL-containing encodings, but I don't think it's worth the effort. > diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt > new file mode 100644 > index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb > GIT binary patch > literal 32 > mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp< I was disappointed not to find a secret message. :) -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL 2011-12-13 17:59 ` Jeff King @ 2011-12-14 5:23 ` Miles Bader 2011-12-14 7:17 ` Jeff King 2012-01-01 16:27 ` Drew Northup 1 sibling, 1 reply; 23+ messages in thread From: Miles Bader @ 2011-12-14 5:23 UTC (permalink / raw) To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git Jeff King <peff@peff.net> writes: > But maybe it would be nicer to say something like: > > error: your commit message contains NUL characters. > hint: This is often caused by using multibyte encodings such as > hint: UTF-16. Please check your editor settings. I think the error message with the hint is much better for users, but isn't the term "multibyte" a little misleading here? It seems like it's really _wide_ encodings that are generally the culprit. [UTF-16 of course is particularly nasty in that it uses both units which are wider than a byte ("wide"), _and_ multiple units per code-point....] Thanks, -Miles -- Quack, n. A murderer without a license. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL 2011-12-14 5:23 ` Miles Bader @ 2011-12-14 7:17 ` Jeff King 0 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2011-12-14 7:17 UTC (permalink / raw) To: Miles Bader; +Cc: Nguyễn Thái Ngọc Duy, git On Wed, Dec 14, 2011 at 02:23:29PM +0900, Miles Bader wrote: > Jeff King <peff@peff.net> writes: > > But maybe it would be nicer to say something like: > > > > error: your commit message contains NUL characters. > > hint: This is often caused by using multibyte encodings such as > > hint: UTF-16. Please check your editor settings. > > I think the error message with the hint is much better for users, but > isn't the term "multibyte" a little misleading here? It seems like > it's really _wide_ encodings that are generally the culprit. Yeah, wide is probably a better term. I'm not sure it is rigorously defined anywhere, but in general I think it refers to the set of encodings that do not care about the embedding of 8-bit ascii bytes as subsets. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL 2011-12-13 17:59 ` Jeff King 2011-12-14 5:23 ` Miles Bader @ 2012-01-01 16:27 ` Drew Northup 2012-01-03 20:03 ` Jeff King 1 sibling, 1 reply; 23+ messages in thread From: Drew Northup @ 2012-01-01 16:27 UTC (permalink / raw) To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git On Tue, 2011-12-13 at 12:59 -0500, Jeff King wrote: > It looks like we already have a check for is_utf8, and this is not > failing that check. I guess because is_utf8 takes a NUL-terminated > buffer, so it simply sees the truncated result (i.e., depending on > endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check > only "f"). We could make is_utf8 take a length parameter to be more > accurate, and then it would catch this. > > However, I think that's not quite what we want. We only check is_utf8 if > the encoding field is not set. And really, we want to reject NULs no > matter _which_ encoding they've set, because git simply doesn't handle > them properly. I had already started experimenting with automatically detecting decent UTF-16 a long while back so that compatible platforms could handle it appropriately in terms of creating diffs and dealing with newline munging between platforms. There is no 100% sure-fire check for UTF-16 if you don't already suspect it is possibly UTF-16. If we really want to check for possible UTF-16 specifically I can scrape out the check I wrote up and send it along. The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It seems to me that it was written as part of the "is this a binary or not" check in the add/commit path. I have thought for some time that specifying buffer length in that whole code path would be a good idea (but I thought that somebody else had taken up that battle while I was busy dealing with other problems elsewhere), if for no other reason it would force it to deal with NULs more intelligently. -- -Drew Northup ________________________________________________ "As opposed to vegetable or mineral error?" -John Pescatore, SANS NewsBites Vol. 12 Num. 59 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL 2012-01-01 16:27 ` Drew Northup @ 2012-01-03 20:03 ` Jeff King 0 siblings, 0 replies; 23+ messages in thread From: Jeff King @ 2012-01-03 20:03 UTC (permalink / raw) To: Drew Northup; +Cc: Nguyễn Thái Ngọc Duy, git On Sun, Jan 01, 2012 at 11:27:31AM -0500, Drew Northup wrote: > I had already started experimenting with automatically detecting decent > UTF-16 a long while back so that compatible platforms could handle it > appropriately in terms of creating diffs and dealing with newline > munging between platforms. There is no 100% sure-fire check for UTF-16 > if you don't already suspect it is possibly UTF-16. If we really want to > check for possible UTF-16 specifically I can scrape out the check I > wrote up and send it along. I also looked into this recently. You can generally detect UTF-16 by the BOM at the beginning of the file (which will also tell you the endian-ness). I did a simple test by integrating it into the check for binary-ness during diffs. However, as I recall, the result wasn't particularly useful. Some of the diff code wasn't happy with the embedded NUL bytes (i.e., there is code that assumes that NUL is the end of a string). Not to mention that ascii newline (0x0a) can appear as part of other characters in a wide encoding like utf-16. And since git outputs straight ascii for all of the diff boilerplate, you end up with a mish-mash of utf-16 and ascii (this is OK with utf-8, of course, because utf-8 is a superset of ascii). If anything, I think you would want to do something like "textconv" to convert the utf-16 into utf-8, then diff that. Git won't do it automatically based on encoding, but if you know the filenames of the utf-16 files in your repository, you can do something like: echo 'foo.txt diff=utf16' >.gitattributes git config diff.utf16.textconv 'iconv -f utf16 -t utf8' and get readable diffs. Of course you couldn't use that diff to apply a patch, though. I strongly suspect that not many people are really using git for utf-16 files. Git treats them as binary, which makes them unpleasant for anything except simple storage. > The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It > seems to me that it was written as part of the "is this a binary or not" > check in the add/commit path. We shouldn't care about binary file content at all in the add or commit code paths. I would guess we do only if you are using auto-crlf (but then, I don't think we care about utf8 in that cases, only whether line endings should be converted or not). We do check that the commit message itself is utf8, but only to generate a warning that you should set i81n.commitencoding. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] git-commit rejects messages with NULs 2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2011-12-13 17:59 ` Jeff King @ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw) To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy Hi, I'm sorry I forgot the patch that makes commit_tree() take message length. This version rewords the error message and use memchr() instead. Nguyễn Thái Ngọc Duy (3): Make commit_tree() take message length in addition to the commit message merge: abort if fails to commit Do not create commits whose message contains NUL Documentation/config.txt | 4 ++++ advice.c | 2 ++ advice.h | 1 + builtin/commit-tree.c | 2 +- builtin/commit.c | 2 +- builtin/merge.c | 8 ++++++-- builtin/notes.c | 2 +- commit.c | 13 +++++++++++-- commit.h | 2 +- notes-cache.c | 2 +- notes-merge.c | 8 ++++---- notes-merge.h | 2 +- t/t3900-i18n-commit.sh | 6 ++++++ t/t3900/UTF-16.txt | Bin 0 -> 32 bytes 14 files changed, 40 insertions(+), 14 deletions(-) create mode 100644 t/t3900/UTF-16.txt -- 1.7.8.36.g69ee2 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] Make commit_tree() take message length in addition to the commit message 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy 2011-12-14 18:12 ` Junio C Hamano 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2 siblings, 2 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw) To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/commit-tree.c | 2 +- builtin/commit.c | 2 +- builtin/merge.c | 4 ++-- builtin/notes.c | 2 +- commit.c | 4 ++-- commit.h | 2 +- notes-cache.c | 2 +- notes-merge.c | 8 ++++---- notes-merge.h | 2 +- 9 files changed, 14 insertions(+), 14 deletions(-) diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c index d083795..8fa384f 100644 --- a/builtin/commit-tree.c +++ b/builtin/commit-tree.c @@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix) if (strbuf_read(&buffer, 0, 0) < 0) die_errno("git commit-tree: failed to read"); - if (commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) { + if (commit_tree(buffer.buf, buffer.len, tree_sha1, parents, commit_sha1, NULL)) { strbuf_release(&buffer); return 1; } diff --git a/builtin/commit.c b/builtin/commit.c index 8f2bebe..ce0e47f 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1483,7 +1483,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) exit(1); } - if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1, + if (commit_tree(sb.buf, sb.len, active_cache_tree->sha1, parents, sha1, author_ident.buf)) { rollback_index_files(); die(_("failed to write commit object")); diff --git a/builtin/merge.c b/builtin/merge.c index 2870a6a..df4548a 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -913,7 +913,7 @@ static int merge_trivial(struct commit *head) parent->next->item = remoteheads->item; parent->next->next = NULL; prepare_to_commit(); - commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL); + commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL); finish(head, result_commit, "In-index merge"); drop_save(); return 0; @@ -944,7 +944,7 @@ static int finish_automerge(struct commit *head, strbuf_addch(&merge_msg, '\n'); prepare_to_commit(); free_commit_list(remoteheads); - commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL); + commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL); strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); finish(head, result_commit, buf.buf); strbuf_release(&buf); diff --git a/builtin/notes.c b/builtin/notes.c index f8e437d..d665459 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -306,7 +306,7 @@ void commit_notes(struct notes_tree *t, const char *msg) if (buf.buf[buf.len - 1] != '\n') strbuf_addch(&buf, '\n'); /* Make sure msg ends with newline */ - create_notes_commit(t, NULL, buf.buf + 7, commit_sha1); + create_notes_commit(t, NULL, buf.buf + 7, buf.len - 7, commit_sha1); update_ref(buf.buf, t->ref, commit_sha1, NULL, 0, DIE_ON_ERR); strbuf_release(&buf); diff --git a/commit.c b/commit.c index 73b7e00..d67b8c7 100644 --- a/commit.c +++ b/commit.c @@ -845,7 +845,7 @@ static const char commit_utf8_warn[] = "You may want to amend it after fixing the message, or set the config\n" "variable i18n.commitencoding to the encoding your project uses.\n"; -int commit_tree(const char *msg, unsigned char *tree, +int commit_tree(const char *msg, size_t msg_len, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author) { @@ -884,7 +884,7 @@ int commit_tree(const char *msg, unsigned char *tree, strbuf_addch(&buffer, '\n'); /* And add the comment */ - strbuf_addstr(&buffer, msg); + strbuf_add(&buffer, msg, msg_len); /* And check the encoding */ if (encoding_is_utf8 && !is_utf8(buffer.buf)) diff --git a/commit.h b/commit.h index 009b113..1acaf53 100644 --- a/commit.h +++ b/commit.h @@ -181,7 +181,7 @@ static inline int single_parent(struct commit *commit) struct commit_list *reduce_heads(struct commit_list *heads); -extern int commit_tree(const char *msg, unsigned char *tree, +extern int commit_tree(const char *msg, size_t msg_len, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author); diff --git a/notes-cache.c b/notes-cache.c index 4c8984e..04a5698 100644 --- a/notes-cache.c +++ b/notes-cache.c @@ -56,7 +56,7 @@ int notes_cache_write(struct notes_cache *c) if (write_notes_tree(&c->tree, tree_sha1)) return -1; - if (commit_tree(c->validity, tree_sha1, NULL, commit_sha1, NULL) < 0) + if (commit_tree(c->validity, strlen(c->validity), tree_sha1, NULL, commit_sha1, NULL) < 0) return -1; if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL, 0, QUIET_ON_ERR) < 0) diff --git a/notes-merge.c b/notes-merge.c index ce10aac..b3baaf4 100644 --- a/notes-merge.c +++ b/notes-merge.c @@ -530,7 +530,7 @@ static int merge_from_diffs(struct notes_merge_options *o, } void create_notes_commit(struct notes_tree *t, struct commit_list *parents, - const char *msg, unsigned char *result_sha1) + const char *msg, size_t msg_len, unsigned char *result_sha1) { unsigned char tree_sha1[20]; @@ -551,7 +551,7 @@ void create_notes_commit(struct notes_tree *t, struct commit_list *parents, /* else: t->ref points to nothing, assume root/orphan commit */ } - if (commit_tree(msg, tree_sha1, parents, result_sha1, NULL)) + if (commit_tree(msg, msg_len, tree_sha1, parents, result_sha1, NULL)) die("Failed to commit notes tree to database"); } @@ -669,7 +669,7 @@ int notes_merge(struct notes_merge_options *o, commit_list_insert(remote, &parents); /* LIFO order */ commit_list_insert(local, &parents); create_notes_commit(local_tree, parents, o->commit_msg.buf, - result_sha1); + o->commit_msg.len, result_sha1); } found_result: @@ -734,7 +734,7 @@ int notes_merge_commit(struct notes_merge_options *o, } create_notes_commit(partial_tree, partial_commit->parents, msg, - result_sha1); + strlen(msg), result_sha1); if (o->verbosity >= 4) printf("Finalized notes merge commit: %s\n", sha1_to_hex(result_sha1)); diff --git a/notes-merge.h b/notes-merge.h index 168a672..fd52988 100644 --- a/notes-merge.h +++ b/notes-merge.h @@ -37,7 +37,7 @@ void init_notes_merge_options(struct notes_merge_options *o); * The resulting commit SHA1 is stored in result_sha1. */ void create_notes_commit(struct notes_tree *t, struct commit_list *parents, - const char *msg, unsigned char *result_sha1); + const char *msg, size_t msg_len, unsigned char *result_sha1); /* * Merge notes from o->remote_ref into o->local_ref -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] Make commit_tree() take message length in addition to the commit message 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy @ 2011-12-14 18:12 ` Junio C Hamano 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy 1 sibling, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2011-12-14 18:12 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Justification? As all 3 primary users of this API feed strbuf.buf to the function, it would make more sense to change the first parameter to a pointer to a strbuf, no? ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/3] merge: abort if fails to commit 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy 2011-12-14 18:12 ` Junio C Hamano @ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw) To: git Cc: Jeff King, Miles Bader, Junio C Hamano, Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- 2011/12/15 Junio C Hamano <gitster@pobox.com>: >> - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL); >> + if (commit_tree(merge_msg.buf, merge_msg.len, >> + result_tree, parent, result_commit, NULL)) >> + die(_("failed to write commit object")); >> finish(head, result_commit, "In-index merge"); >> drop_save(); >> return 0; > > Should we die immediately, or should we do some clean-ups after ourselves > before doing so? I'm not sure. I had a quick look over the command and it seems we do not need to do any clean-ups. But I'm not familiar with the command anyway.. > In any case, this is a good change that shouldn't be taken hostage to the > unrelated change in patch [1/3]. Moved it up so it you can cherry-pick it independently. builtin/merge.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/builtin/merge.c b/builtin/merge.c index 2870a6a..27576c0 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -913,7 +913,8 @@ static int merge_trivial(struct commit *head) parent->next->item = remoteheads->item; parent->next->next = NULL; prepare_to_commit(); - commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL); + if (commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL)) + die(_("failed to write commit object")); finish(head, result_commit, "In-index merge"); drop_save(); return 0; @@ -944,7 +945,8 @@ static int finish_automerge(struct commit *head, strbuf_addch(&merge_msg, '\n'); prepare_to_commit(); free_commit_list(remoteheads); - commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL); + if (commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL)) + die(_("failed to write commit object")); strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); finish(head, result_commit, buf.buf); strbuf_release(&buf); -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/3] Convert commit_tree() to take strbuf as message 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy 2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano 2 siblings, 0 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw) To: git Cc: Jeff King, Miles Bader, Junio C Hamano, Nguyễn Thái Ngọc Duy Because strbuf provides message length, we can create commits that include NULs. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/commit-tree.c | 2 +- builtin/commit.c | 2 +- builtin/merge.c | 4 ++-- builtin/notes.c | 4 ++-- commit.c | 4 ++-- commit.h | 2 +- notes-cache.c | 5 ++++- notes-merge.c | 10 ++++++---- notes-merge.h | 2 +- 9 files changed, 20 insertions(+), 15 deletions(-) diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c index d083795..0895861 100644 --- a/builtin/commit-tree.c +++ b/builtin/commit-tree.c @@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix) if (strbuf_read(&buffer, 0, 0) < 0) die_errno("git commit-tree: failed to read"); - if (commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) { + if (commit_tree(&buffer, tree_sha1, parents, commit_sha1, NULL)) { strbuf_release(&buffer); return 1; } diff --git a/builtin/commit.c b/builtin/commit.c index 8f2bebe..849151e 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1483,7 +1483,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) exit(1); } - if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1, + if (commit_tree(&sb, active_cache_tree->sha1, parents, sha1, author_ident.buf)) { rollback_index_files(); die(_("failed to write commit object")); diff --git a/builtin/merge.c b/builtin/merge.c index 27576c0..e066bf1 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -913,7 +913,7 @@ static int merge_trivial(struct commit *head) parent->next->item = remoteheads->item; parent->next->next = NULL; prepare_to_commit(); - if (commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL)) + if (commit_tree(&merge_msg, result_tree, parent, result_commit, NULL)) die(_("failed to write commit object")); finish(head, result_commit, "In-index merge"); drop_save(); @@ -945,7 +945,7 @@ static int finish_automerge(struct commit *head, strbuf_addch(&merge_msg, '\n'); prepare_to_commit(); free_commit_list(remoteheads); - if (commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL)) + if (commit_tree(&merge_msg, result_tree, parents, result_commit, NULL)) die(_("failed to write commit object")); strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); finish(head, result_commit, buf.buf); diff --git a/builtin/notes.c b/builtin/notes.c index f8e437d..5e32548 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -301,12 +301,12 @@ void commit_notes(struct notes_tree *t, const char *msg) return; /* don't have to commit an unchanged tree */ /* Prepare commit message and reflog message */ - strbuf_addstr(&buf, "notes: "); /* commit message starts at index 7 */ strbuf_addstr(&buf, msg); if (buf.buf[buf.len - 1] != '\n') strbuf_addch(&buf, '\n'); /* Make sure msg ends with newline */ - create_notes_commit(t, NULL, buf.buf + 7, commit_sha1); + create_notes_commit(t, NULL, &buf, commit_sha1); + strbuf_insert(&buf, 0, "notes: ", 7); /* commit message starts at index 7 */ update_ref(buf.buf, t->ref, commit_sha1, NULL, 0, DIE_ON_ERR); strbuf_release(&buf); diff --git a/commit.c b/commit.c index 73b7e00..0a214a6 100644 --- a/commit.c +++ b/commit.c @@ -845,7 +845,7 @@ static const char commit_utf8_warn[] = "You may want to amend it after fixing the message, or set the config\n" "variable i18n.commitencoding to the encoding your project uses.\n"; -int commit_tree(const char *msg, unsigned char *tree, +int commit_tree(const struct strbuf *msg, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author) { @@ -884,7 +884,7 @@ int commit_tree(const char *msg, unsigned char *tree, strbuf_addch(&buffer, '\n'); /* And add the comment */ - strbuf_addstr(&buffer, msg); + strbuf_addbuf(&buffer, msg); /* And check the encoding */ if (encoding_is_utf8 && !is_utf8(buffer.buf)) diff --git a/commit.h b/commit.h index 009b113..5cf46b2 100644 --- a/commit.h +++ b/commit.h @@ -181,7 +181,7 @@ static inline int single_parent(struct commit *commit) struct commit_list *reduce_heads(struct commit_list *heads); -extern int commit_tree(const char *msg, unsigned char *tree, +extern int commit_tree(const struct strbuf *msg, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author); diff --git a/notes-cache.c b/notes-cache.c index 4c8984e..bea013e 100644 --- a/notes-cache.c +++ b/notes-cache.c @@ -48,6 +48,7 @@ int notes_cache_write(struct notes_cache *c) { unsigned char tree_sha1[20]; unsigned char commit_sha1[20]; + struct strbuf msg = STRBUF_INIT; if (!c || !c->tree.initialized || !c->tree.ref || !*c->tree.ref) return -1; @@ -56,7 +57,9 @@ int notes_cache_write(struct notes_cache *c) if (write_notes_tree(&c->tree, tree_sha1)) return -1; - if (commit_tree(c->validity, tree_sha1, NULL, commit_sha1, NULL) < 0) + strbuf_attach(&msg, c->validity, + strlen(c->validity), strlen(c->validity) + 1); + if (commit_tree(&msg, tree_sha1, NULL, commit_sha1, NULL) < 0) return -1; if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL, 0, QUIET_ON_ERR) < 0) diff --git a/notes-merge.c b/notes-merge.c index ce10aac..b5a36ac 100644 --- a/notes-merge.c +++ b/notes-merge.c @@ -530,7 +530,7 @@ static int merge_from_diffs(struct notes_merge_options *o, } void create_notes_commit(struct notes_tree *t, struct commit_list *parents, - const char *msg, unsigned char *result_sha1) + const struct strbuf *msg, unsigned char *result_sha1) { unsigned char tree_sha1[20]; @@ -668,7 +668,7 @@ int notes_merge(struct notes_merge_options *o, struct commit_list *parents = NULL; commit_list_insert(remote, &parents); /* LIFO order */ commit_list_insert(local, &parents); - create_notes_commit(local_tree, parents, o->commit_msg.buf, + create_notes_commit(local_tree, parents, &o->commit_msg, result_sha1); } @@ -695,7 +695,8 @@ int notes_merge_commit(struct notes_merge_options *o, struct dir_struct dir; char *path = xstrdup(git_path(NOTES_MERGE_WORKTREE "/")); int path_len = strlen(path), i; - const char *msg = strstr(partial_commit->buffer, "\n\n"); + char *msg = strstr(partial_commit->buffer, "\n\n"); + struct strbuf sb_msg = STRBUF_INIT; if (o->verbosity >= 3) printf("Committing notes in notes merge worktree at %.*s\n", @@ -733,7 +734,8 @@ int notes_merge_commit(struct notes_merge_options *o, sha1_to_hex(obj_sha1), sha1_to_hex(blob_sha1)); } - create_notes_commit(partial_tree, partial_commit->parents, msg, + strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1); + create_notes_commit(partial_tree, partial_commit->parents, &sb_msg, result_sha1); if (o->verbosity >= 4) printf("Finalized notes merge commit: %s\n", diff --git a/notes-merge.h b/notes-merge.h index 168a672..0c11b17 100644 --- a/notes-merge.h +++ b/notes-merge.h @@ -37,7 +37,7 @@ void init_notes_merge_options(struct notes_merge_options *o); * The resulting commit SHA1 is stored in result_sha1. */ void create_notes_commit(struct notes_tree *t, struct commit_list *parents, - const char *msg, unsigned char *result_sha1); + const struct strbuf *msg, unsigned char *result_sha1); /* * Merge notes from o->remote_ref into o->local_ref -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/3] commit: refuse commit messages that contain NULs 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy 2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano 2 siblings, 0 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw) To: git Cc: Jeff King, Miles Bader, Junio C Hamano, Nguyễn Thái Ngọc Duy Current implementation sees NUL as terminator. If users give a NUL-included message (e.g. editor accidentally set to save as UTF-16), the new commit message will have NULs. However following operations (displaying or amending a commit for example) will not show anything after the first NUL. Stop user right when they do this. If NUL is added by mistake, they have their chance to fix. If they know that they are doing, commit-tree will gladly commit whatever is given. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Advice stuff dropped. I realized quite late that commit_tree() is also used for plumbing commands (also thanks to Junio's comments), while I wanted to check at porcelain level only. So I moved the check up to builtin/commit.c. If we need the same check for other commands, which I doubt, similar checks can be added. builtin/commit.c | 7 +++++++ t/t3900-i18n-commit.sh | 6 ++++++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/builtin/commit.c b/builtin/commit.c index 849151e..5db7673 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1483,6 +1483,13 @@ int cmd_commit(int argc, const char **argv, const char *prefix) exit(1); } + if (memchr(sb.buf, '\0', sb.len)) { + rollback_index_files(); + die(_("your commit message contains NUL characters.\n" + "hint: This is often caused by using wide encodings such as\n" + "hint: UTF-16. Please check your editor settings.")); + } + if (commit_tree(&sb, active_cache_tree->sha1, parents, sha1, author_ident.buf)) { rollback_index_files(); diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh index 1f62c15..d48a7c0 100755 --- a/t/t3900-i18n-commit.sh +++ b/t/t3900-i18n-commit.sh @@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' ' test z = "z$E" ' +test_expect_failure 'UTF-16 refused because of NULs' ' + echo UTF-16 >F && + git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt +' + + for H in ISO8859-1 eucJP ISO-2022-JP do test_expect_success "$H setup" ' -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/3] merge: abort if fails to commit 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy @ 2011-12-15 18:23 ` Junio C Hamano 2 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2011-12-15 18:23 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/3] merge: abort if fails to commit 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy 2011-12-14 18:13 ` Junio C Hamano 2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2 siblings, 1 reply; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw) To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- builtin/merge.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/builtin/merge.c b/builtin/merge.c index df4548a..e57eefa 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -913,7 +913,9 @@ static int merge_trivial(struct commit *head) parent->next->item = remoteheads->item; parent->next->next = NULL; prepare_to_commit(); - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL); + if (commit_tree(merge_msg.buf, merge_msg.len, + result_tree, parent, result_commit, NULL)) + die(_("failed to write commit object")); finish(head, result_commit, "In-index merge"); drop_save(); return 0; @@ -944,7 +946,9 @@ static int finish_automerge(struct commit *head, strbuf_addch(&merge_msg, '\n'); prepare_to_commit(); free_commit_list(remoteheads); - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL); + if (commit_tree(merge_msg.buf, merge_msg.len, + result_tree, parents, result_commit, NULL)) + die(_("failed to write commit object")); strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); finish(head, result_commit, buf.buf); strbuf_release(&buf); -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] merge: abort if fails to commit 2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy @ 2011-12-14 18:13 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2011-12-14 18:13 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > --- > builtin/merge.c | 8 ++++++-- > 1 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/builtin/merge.c b/builtin/merge.c > index df4548a..e57eefa 100644 > --- a/builtin/merge.c > +++ b/builtin/merge.c > @@ -913,7 +913,9 @@ static int merge_trivial(struct commit *head) > parent->next->item = remoteheads->item; > parent->next->next = NULL; > prepare_to_commit(); > - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL); > + if (commit_tree(merge_msg.buf, merge_msg.len, > + result_tree, parent, result_commit, NULL)) > + die(_("failed to write commit object")); > finish(head, result_commit, "In-index merge"); > drop_save(); > return 0; Should we die immediately, or should we do some clean-ups after ourselves before doing so? In any case, this is a good change that shouldn't be taken hostage to the unrelated change in patch [1/3]. Thanks. > @@ -944,7 +946,9 @@ static int finish_automerge(struct commit *head, > strbuf_addch(&merge_msg, '\n'); > prepare_to_commit(); > free_commit_list(remoteheads); > - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL); > + if (commit_tree(merge_msg.buf, merge_msg.len, > + result_tree, parents, result_commit, NULL)) > + die(_("failed to write commit object")); > strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy); > finish(head, result_commit, buf.buf); > strbuf_release(&buf); ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy 2011-12-14 18:19 ` Junio C Hamano 2011-12-15 1:04 ` Miles Bader 2 siblings, 2 replies; 23+ messages in thread From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw) To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy We assume that the commit log messages are uninterpreted sequences of non-NUL bytes (see Documentation/i18n.txt). However the assumption does not really stand out and it's quite easy to set an editor to save in a NUL-included encoding. Currently we silently cut at the first NUL we see. Make it more obvious that NUL is not welcome by refusing to create such commits. Those who deliberately want to create them can still do with hash-object. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> --- Documentation/config.txt | 4 ++++ advice.c | 2 ++ advice.h | 1 + commit.c | 9 +++++++++ t/t3900-i18n-commit.sh | 6 ++++++ t/t3900/UTF-16.txt | Bin 0 -> 32 bytes 6 files changed, 22 insertions(+), 0 deletions(-) create mode 100644 t/t3900/UTF-16.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 5a841da..daf57c2 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -144,6 +144,10 @@ advice.*:: Advice shown when you used linkgit::git-checkout[1] to move to the detach HEAD state, to instruct how to create a local branch after the fact. Default: true. + commitWideEncoding:: + Advice shown when linkgit::git-commit[1] refuses to + proceed because there are NULs in commit message. + Default: true. -- core.fileMode:: diff --git a/advice.c b/advice.c index e02e632..130949e 100644 --- a/advice.c +++ b/advice.c @@ -6,6 +6,7 @@ int advice_commit_before_merge = 1; int advice_resolve_conflict = 1; int advice_implicit_identity = 1; int advice_detached_head = 1; +int advice_commmit_wide_encoding = 1; static struct { const char *name; @@ -17,6 +18,7 @@ static struct { { "resolveconflict", &advice_resolve_conflict }, { "implicitidentity", &advice_implicit_identity }, { "detachedhead", &advice_detached_head }, + { "commitwideencoding", &advice_commmit_wide_encoding }, }; void advise(const char *advice, ...) diff --git a/advice.h b/advice.h index e5d0af7..d913bdb 100644 --- a/advice.h +++ b/advice.h @@ -9,6 +9,7 @@ extern int advice_commit_before_merge; extern int advice_resolve_conflict; extern int advice_implicit_identity; extern int advice_detached_head; +extern int advice_commmit_wide_encoding; int git_default_advice_config(const char *var, const char *value); void advise(const char *advice, ...); diff --git a/commit.c b/commit.c index d67b8c7..59e5bce 100644 --- a/commit.c +++ b/commit.c @@ -855,6 +855,15 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree, assert_sha1_type(tree, OBJ_TREE); + if (memchr(msg, '\0', msg_len)) { + error(_("your commit message contains NUL characters.")); + if (advice_commmit_wide_encoding) { + advise(_("This is often caused by using wide encodings such as")); + advise(_("UTF-16. Please check your editor settings.")); + } + return -1; + } + /* Not having i18n.commitencoding is the same as having utf-8 */ encoding_is_utf8 = is_encoding_utf8(git_commit_encoding); diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh index 1f62c15..d48a7c0 100755 --- a/t/t3900-i18n-commit.sh +++ b/t/t3900-i18n-commit.sh @@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' ' test z = "z$E" ' +test_expect_failure 'UTF-16 refused because of NULs' ' + echo UTF-16 >F && + git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt +' + + for H in ISO8859-1 eucJP ISO-2022-JP do test_expect_success "$H setup" ' diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt new file mode 100644 index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb GIT binary patch literal 32 mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp< literal 0 HcmV?d00001 -- 1.7.8.36.g69ee2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy @ 2011-12-14 18:19 ` Junio C Hamano 2011-12-14 18:29 ` Jeff King 2011-12-15 1:04 ` Miles Bader 1 sibling, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2011-12-14 18:19 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes: > We assume that the commit log messages are uninterpreted sequences of > non-NUL bytes (see Documentation/i18n.txt). However the assumption > does not really stand out and it's quite easy to set an editor to save > in a NUL-included encoding. Currently we silently cut at the first NUL > we see. > > Make it more obvious that NUL is not welcome by refusing to create > such commits. Those who deliberately want to create them can still do > with hash-object. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Limiting the Porcelain layer to deal only with reasonable text encodings (yes, I am declaring that utf16 is not among them) is perfectly fine, but I was somehow hoping that you would allow the option for the low-level function commit_tree() to create a commit object with binary blob in the body part, especially after seeing the patch 1/3 to do so. Certainly that kind of usage would not give the binary blob literally in "git log" output, but it is with or without the issue around NUL byte. A custom program linked with commit.c to call commit_tree() may not be using the data structure to store anything that is meant to be read by "git log" to begin with. Not a strong veto at all, just throwing out something to think about. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-14 18:19 ` Junio C Hamano @ 2011-12-14 18:29 ` Jeff King 2011-12-15 18:46 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2011-12-14 18:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Miles Bader On Wed, Dec 14, 2011 at 10:19:16AM -0800, Junio C Hamano wrote: > Limiting the Porcelain layer to deal only with reasonable text encodings > (yes, I am declaring that utf16 is not among them) is perfectly fine, but > I was somehow hoping that you would allow the option for the low-level > function commit_tree() to create a commit object with binary blob in the > body part, especially after seeing the patch 1/3 to do so. > > Certainly that kind of usage would not give the binary blob literally in > "git log" output, but it is with or without the issue around NUL byte. A > custom program linked with commit.c to call commit_tree() may not be using > the data structure to store anything that is meant to be read by "git log" > to begin with. I'm happy to ignore custom programs linking against internal git code, but what should "git commit-tree" do? My gut feeling is that it should store the literal binary contents. However, I don't think this has ever been the case. Even in the initial version of commit-tree.c, we read the input line-by-line and sprintf it into place. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-14 18:29 ` Jeff King @ 2011-12-15 18:46 ` Junio C Hamano 2011-12-15 19:35 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Junio C Hamano @ 2011-12-15 18:46 UTC (permalink / raw) To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git, Miles Bader Jeff King <peff@peff.net> writes: > I'm happy to ignore custom programs linking against internal git code, > but what should "git commit-tree" do? > > My gut feeling is that it should store the literal binary contents. > However, I don't think this has ever been the case. Even in the initial > version of commit-tree.c, we read the input line-by-line and sprintf it > into place. Yeah, you are right. Perhaps we should tweak updated 3/3 to check at the lower level commit_tree() then. I've rewrote the log message for 2/3 as follows so we can go either way ;-) Convert commit_tree() to take strbuf as message There wan't a way for commit_tree() to notice if the message the caller prepared contained a NUL byte, as it did not take the length of the message as a parameter. Use a pointer to a strbuf instead, so that we can either choose to allow low-level plumbing commands to make commits that contain NUL byte in its message, or forbid NUL everywhere by adding the check in commit_tree(), in later patches. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-15 18:46 ` Junio C Hamano @ 2011-12-15 19:35 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2011-12-15 19:35 UTC (permalink / raw) To: git; +Cc: Jeff King, Nguyễn Thái Ngọc Duy, Miles Bader Junio C Hamano <gitster@pobox.com> writes: >> My gut feeling is that it should store the literal binary contents. >> However, I don't think this has ever been the case. Even in the initial >> version of commit-tree.c, we read the input line-by-line and sprintf it >> into place. > > Yeah, you are right. Perhaps we should tweak updated 3/3 to check at the > lower level commit_tree() then. > > I've rewrote the log message for 2/3 as follows so we can go either way > ;-) s/rewrote/rewritten/ obviously... > Convert commit_tree() to take strbuf as message > > There wan't a way for commit_tree() to notice if the message the caller > prepared contained a NUL byte, as it did not take the length of the > message as a parameter. Use a pointer to a strbuf instead, so that we can > either choose to allow low-level plumbing commands to make commits > that contain NUL byte in its message, or forbid NUL everywhere by > adding the check in commit_tree(), in later patches. > > Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > Signed-off-by: Junio C Hamano <gitster@pobox.com> And 3/3 looks like this: commit_tree(): refuse commit messages that contain NULs Current implementation sees NUL as terminator. If users give a message with NUL byte in it (e.g. editor set to save as UTF-16), the new commit message will have NULs. However following operations (displaying or amending a commit for example) will not keep anything after the first NUL. Stop user right when they do this. If NUL is added by mistake, they have their chance to fix. Otherwise, log messages will no longer be text "git log" and friends would grok. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2011-12-14 18:19 ` Junio C Hamano @ 2011-12-15 1:04 ` Miles Bader 2011-12-15 1:18 ` Jeff King 1 sibling, 1 reply; 23+ messages in thread From: Miles Bader @ 2011-12-15 1:04 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King > + commitWideEncoding:: > + Advice shown when linkgit::git-commit[1] refuses to > + proceed because there are NULs in commit message. > + Default: true. Although "wide encoding" is a reasonable guess at cause of embedded zero characters (and so a useful term for diagnostic messages, as it can help users identify the problem in their environment which is causing such zero bytes), it's really only a guess in most cases... Shouldn't the variable be named based on what it actually does, which is allow zero-bytes in commit messages...? -Miles -- Cat is power. Cat is peace. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-15 1:04 ` Miles Bader @ 2011-12-15 1:18 ` Jeff King 2011-12-15 3:09 ` Junio C Hamano 0 siblings, 1 reply; 23+ messages in thread From: Jeff King @ 2011-12-15 1:18 UTC (permalink / raw) To: Miles Bader; +Cc: Nguyễn Thái Ngọc Duy, git On Thu, Dec 15, 2011 at 10:04:06AM +0900, Miles Bader wrote: > > + commitWideEncoding:: > > + Advice shown when linkgit::git-commit[1] refuses to > > + proceed because there are NULs in commit message. > > + Default: true. > > Although "wide encoding" is a reasonable guess at cause of embedded > zero characters (and so a useful term for diagnostic messages, as it > can help users identify the problem in their environment which is > causing such zero bytes), it's really only a guess in most cases... > > Shouldn't the variable be named based on what it actually does, which > is allow zero-bytes in commit messages...? I agree, but... Really this variable is overkill. The advice.* subsystem is for silencing hints and warnings from git that you see repeatedly because you are smarter than git, and want to ignore its advice. But in this case, I don't see a user saying "stupid git, of _course_ I want to commit NULs. Stop nagging me". Especially because it is not a warning, but a fatal error. :) So yes, it's verbose, but no, it's not something somebody is going to be so bothered by that they will find the config option to turn it off. Instead, they will stop doing the bad thing and never see it again. At best this config option is useless, and at worst it clutters the advice.* namespace, making it harder for people to find the advice option they _do_ want to turn off). Perhaps it should just be dropped. -Peff ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL 2011-12-15 1:18 ` Jeff King @ 2011-12-15 3:09 ` Junio C Hamano 0 siblings, 0 replies; 23+ messages in thread From: Junio C Hamano @ 2011-12-15 3:09 UTC (permalink / raw) To: Jeff King; +Cc: Miles Bader, Nguyễn Thái Ngọc Duy, git Jeff King <peff@peff.net> writes: > Perhaps it should just be dropped. Thanks. I have nothing to add---you said everything that needs to be said. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-01-03 20:03 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2011-12-13 17:59 ` Jeff King 2011-12-14 5:23 ` Miles Bader 2011-12-14 7:17 ` Jeff King 2012-01-01 16:27 ` Drew Northup 2012-01-03 20:03 ` Jeff King 2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy 2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy 2011-12-14 18:12 ` Junio C Hamano 2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy 2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy 2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano 2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy 2011-12-14 18:13 ` Junio C Hamano 2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy 2011-12-14 18:19 ` Junio C Hamano 2011-12-14 18:29 ` Jeff King 2011-12-15 18:46 ` Junio C Hamano 2011-12-15 19:35 ` Junio C Hamano 2011-12-15 1:04 ` Miles Bader 2011-12-15 1:18 ` Jeff King 2011-12-15 3:09 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).