* [PATCH resend] Do not create commits whose message contains NUL
@ 2011-12-13 11:56 Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
0 siblings, 2 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-13 11:56 UTC (permalink / raw)
To: git; +Cc: Nguyễn Thái Ngọc Duy
We assume that the commit log messages are uninterpreted sequences of
non-NUL bytes (see Documentation/i18n.txt). However the assumption
does not really stand out and it's quite easy to set an editor to save
in a NUL-included encoding. Currently we silently cut at the first NUL
we see.
Make it more obvious that NUL is not welcome by refusing to create
such commits. Those who deliberately want to create them can still do
with hash-object.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
This is from UTF-16 in commit message discussion [1] a few months
ago. I don't want to resurrect the discussion again. However I think
it's a good idea to stop users from shooting themselves in this case,
especially at porcelain level.
There were no comments on this patch previously. So, any comments
this time ? Should I drop it?
[1] http://thread.gmane.org/gmane.comp.version-control.git/184123/focus=184335
commit.c | 3 +++
t/t3900-i18n-commit.sh | 6 ++++++
t/t3900/UTF-16.txt | Bin 0 -> 32 bytes
3 files changed, 9 insertions(+), 0 deletions(-)
create mode 100644 t/t3900/UTF-16.txt
diff --git a/commit.c b/commit.c
index d67b8c7..0775eec 100644
--- a/commit.c
+++ b/commit.c
@@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
assert_sha1_type(tree, OBJ_TREE);
+ if (strlen(msg) < msg_len)
+ die(_("cannot commit with NUL in commit message"));
+
/* Not having i18n.commitencoding is the same as having utf-8 */
encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 1f62c15..d48a7c0 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' '
test z = "z$E"
'
+test_expect_failure 'UTF-16 refused because of NULs' '
+ echo UTF-16 >F &&
+ git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt
+'
+
+
for H in ISO8859-1 eucJP ISO-2022-JP
do
test_expect_success "$H setup" '
diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
new file mode 100644
index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb
GIT binary patch
literal 32
mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp<
literal 0
HcmV?d00001
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
@ 2011-12-13 17:59 ` Jeff King
2011-12-14 5:23 ` Miles Bader
2012-01-01 16:27 ` Drew Northup
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
1 sibling, 2 replies; 23+ messages in thread
From: Jeff King @ 2011-12-13 17:59 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
On Tue, Dec 13, 2011 at 06:56:08PM +0700, Nguyen Thai Ngoc Duy wrote:
> We assume that the commit log messages are uninterpreted sequences of
> non-NUL bytes (see Documentation/i18n.txt). However the assumption
> does not really stand out and it's quite easy to set an editor to save
> in a NUL-included encoding. Currently we silently cut at the first NUL
> we see.
>
> Make it more obvious that NUL is not welcome by refusing to create
> such commits. Those who deliberately want to create them can still do
> with hash-object.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> This is from UTF-16 in commit message discussion [1] a few months
> ago. I don't want to resurrect the discussion again. However I think
> it's a good idea to stop users from shooting themselves in this case,
> especially at porcelain level.
>
> There were no comments on this patch previously. So, any comments
> this time ? Should I drop it?
I think this is a sane thing to do. Having thought about and
experimented a little with utf-16 in the past few months, I really don't
see how you could be disrupting anybody's workflow. utf-16 messages get
butchered so badly already; we are much better off letting the user know
of the problem as soon as possible.
It looks like we already have a check for is_utf8, and this is not
failing that check. I guess because is_utf8 takes a NUL-terminated
buffer, so it simply sees the truncated result (i.e., depending on
endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check
only "f"). We could make is_utf8 take a length parameter to be more
accurate, and then it would catch this.
However, I think that's not quite what we want. We only check is_utf8 if
the encoding field is not set. And really, we want to reject NULs no
matter _which_ encoding they've set, because git simply doesn't handle
them properly.
> diff --git a/commit.c b/commit.c
> index d67b8c7..0775eec 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -855,6 +855,9 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
Hmm. My version of commit_tree does not have a "msg_len" parameter, nor
do I have d67b8c7. Is there some refactoring patch this is based on that
I missed?
> + if (strlen(msg) < msg_len)
> + die(_("cannot commit with NUL in commit message"));
> +
Two nits:
1. For some reason, checking strlen(msg) seems a subtle way of looking
for NULs in a buffer. I would have found:
if (memchr(msg, '\0', msglen))
much more obvious. But perhaps it is just me. Certainly not a big
deal either way.
2. The error message could be a little friendlier. The likely reason
for NULs is a bogus encoding setting in the user's editor. We
already have a nice "your message isn't utf-8" message. Though it
does suggest setting i18n.commitencoding, which probably _isn't_
the solution here (since their encoding clearly isn't supported).
But maybe it would be nicer to say something like:
error: your commit message contains NUL characters.
hint: This is often caused by using multibyte encodings such as
hint: UTF-16. Please check your editor settings.
We could even go further and detect some common NUL-containing
encodings, but I don't think it's worth the effort.
> diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
> new file mode 100644
> index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb
> GIT binary patch
> literal 32
> mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp<
I was disappointed not to find a secret message. :)
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL
2011-12-13 17:59 ` Jeff King
@ 2011-12-14 5:23 ` Miles Bader
2011-12-14 7:17 ` Jeff King
2012-01-01 16:27 ` Drew Northup
1 sibling, 1 reply; 23+ messages in thread
From: Miles Bader @ 2011-12-14 5:23 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> But maybe it would be nicer to say something like:
>
> error: your commit message contains NUL characters.
> hint: This is often caused by using multibyte encodings such as
> hint: UTF-16. Please check your editor settings.
I think the error message with the hint is much better for users, but
isn't the term "multibyte" a little misleading here? It seems like
it's really _wide_ encodings that are generally the culprit.
[UTF-16 of course is particularly nasty in that it uses both units which
are wider than a byte ("wide"), _and_ multiple units per code-point....]
Thanks,
-Miles
--
Quack, n. A murderer without a license.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL
2011-12-14 5:23 ` Miles Bader
@ 2011-12-14 7:17 ` Jeff King
0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2011-12-14 7:17 UTC (permalink / raw)
To: Miles Bader; +Cc: Nguyễn Thái Ngọc Duy, git
On Wed, Dec 14, 2011 at 02:23:29PM +0900, Miles Bader wrote:
> Jeff King <peff@peff.net> writes:
> > But maybe it would be nicer to say something like:
> >
> > error: your commit message contains NUL characters.
> > hint: This is often caused by using multibyte encodings such as
> > hint: UTF-16. Please check your editor settings.
>
> I think the error message with the hint is much better for users, but
> isn't the term "multibyte" a little misleading here? It seems like
> it's really _wide_ encodings that are generally the culprit.
Yeah, wide is probably a better term. I'm not sure it is rigorously
defined anywhere, but in general I think it refers to the set of
encodings that do not care about the embedding of 8-bit ascii bytes as
subsets.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/3] git-commit rejects messages with NULs
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King
@ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw)
To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy
Hi,
I'm sorry I forgot the patch that makes commit_tree() take message
length. This version rewords the error message and use memchr()
instead.
Nguyễn Thái Ngọc Duy (3):
Make commit_tree() take message length in addition to the commit
message
merge: abort if fails to commit
Do not create commits whose message contains NUL
Documentation/config.txt | 4 ++++
advice.c | 2 ++
advice.h | 1 +
builtin/commit-tree.c | 2 +-
builtin/commit.c | 2 +-
builtin/merge.c | 8 ++++++--
builtin/notes.c | 2 +-
commit.c | 13 +++++++++++--
commit.h | 2 +-
notes-cache.c | 2 +-
notes-merge.c | 8 ++++----
notes-merge.h | 2 +-
t/t3900-i18n-commit.sh | 6 ++++++
t/t3900/UTF-16.txt | Bin 0 -> 32 bytes
14 files changed, 40 insertions(+), 14 deletions(-)
create mode 100644 t/t3900/UTF-16.txt
--
1.7.8.36.g69ee2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/3] Make commit_tree() take message length in addition to the commit message
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
@ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy
2011-12-14 18:12 ` Junio C Hamano
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2 siblings, 2 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw)
To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/commit-tree.c | 2 +-
builtin/commit.c | 2 +-
builtin/merge.c | 4 ++--
builtin/notes.c | 2 +-
commit.c | 4 ++--
commit.h | 2 +-
notes-cache.c | 2 +-
notes-merge.c | 8 ++++----
notes-merge.h | 2 +-
9 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index d083795..8fa384f 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
if (strbuf_read(&buffer, 0, 0) < 0)
die_errno("git commit-tree: failed to read");
- if (commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) {
+ if (commit_tree(buffer.buf, buffer.len, tree_sha1, parents, commit_sha1, NULL)) {
strbuf_release(&buffer);
return 1;
}
diff --git a/builtin/commit.c b/builtin/commit.c
index 8f2bebe..ce0e47f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1483,7 +1483,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
exit(1);
}
- if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1,
+ if (commit_tree(sb.buf, sb.len, active_cache_tree->sha1, parents, sha1,
author_ident.buf)) {
rollback_index_files();
die(_("failed to write commit object"));
diff --git a/builtin/merge.c b/builtin/merge.c
index 2870a6a..df4548a 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -913,7 +913,7 @@ static int merge_trivial(struct commit *head)
parent->next->item = remoteheads->item;
parent->next->next = NULL;
prepare_to_commit();
- commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL);
+ commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL);
finish(head, result_commit, "In-index merge");
drop_save();
return 0;
@@ -944,7 +944,7 @@ static int finish_automerge(struct commit *head,
strbuf_addch(&merge_msg, '\n');
prepare_to_commit();
free_commit_list(remoteheads);
- commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL);
+ commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL);
strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
finish(head, result_commit, buf.buf);
strbuf_release(&buf);
diff --git a/builtin/notes.c b/builtin/notes.c
index f8e437d..d665459 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -306,7 +306,7 @@ void commit_notes(struct notes_tree *t, const char *msg)
if (buf.buf[buf.len - 1] != '\n')
strbuf_addch(&buf, '\n'); /* Make sure msg ends with newline */
- create_notes_commit(t, NULL, buf.buf + 7, commit_sha1);
+ create_notes_commit(t, NULL, buf.buf + 7, buf.len - 7, commit_sha1);
update_ref(buf.buf, t->ref, commit_sha1, NULL, 0, DIE_ON_ERR);
strbuf_release(&buf);
diff --git a/commit.c b/commit.c
index 73b7e00..d67b8c7 100644
--- a/commit.c
+++ b/commit.c
@@ -845,7 +845,7 @@ static const char commit_utf8_warn[] =
"You may want to amend it after fixing the message, or set the config\n"
"variable i18n.commitencoding to the encoding your project uses.\n";
-int commit_tree(const char *msg, unsigned char *tree,
+int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
struct commit_list *parents, unsigned char *ret,
const char *author)
{
@@ -884,7 +884,7 @@ int commit_tree(const char *msg, unsigned char *tree,
strbuf_addch(&buffer, '\n');
/* And add the comment */
- strbuf_addstr(&buffer, msg);
+ strbuf_add(&buffer, msg, msg_len);
/* And check the encoding */
if (encoding_is_utf8 && !is_utf8(buffer.buf))
diff --git a/commit.h b/commit.h
index 009b113..1acaf53 100644
--- a/commit.h
+++ b/commit.h
@@ -181,7 +181,7 @@ static inline int single_parent(struct commit *commit)
struct commit_list *reduce_heads(struct commit_list *heads);
-extern int commit_tree(const char *msg, unsigned char *tree,
+extern int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
struct commit_list *parents, unsigned char *ret,
const char *author);
diff --git a/notes-cache.c b/notes-cache.c
index 4c8984e..04a5698 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -56,7 +56,7 @@ int notes_cache_write(struct notes_cache *c)
if (write_notes_tree(&c->tree, tree_sha1))
return -1;
- if (commit_tree(c->validity, tree_sha1, NULL, commit_sha1, NULL) < 0)
+ if (commit_tree(c->validity, strlen(c->validity), tree_sha1, NULL, commit_sha1, NULL) < 0)
return -1;
if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL,
0, QUIET_ON_ERR) < 0)
diff --git a/notes-merge.c b/notes-merge.c
index ce10aac..b3baaf4 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -530,7 +530,7 @@ static int merge_from_diffs(struct notes_merge_options *o,
}
void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
- const char *msg, unsigned char *result_sha1)
+ const char *msg, size_t msg_len, unsigned char *result_sha1)
{
unsigned char tree_sha1[20];
@@ -551,7 +551,7 @@ void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
/* else: t->ref points to nothing, assume root/orphan commit */
}
- if (commit_tree(msg, tree_sha1, parents, result_sha1, NULL))
+ if (commit_tree(msg, msg_len, tree_sha1, parents, result_sha1, NULL))
die("Failed to commit notes tree to database");
}
@@ -669,7 +669,7 @@ int notes_merge(struct notes_merge_options *o,
commit_list_insert(remote, &parents); /* LIFO order */
commit_list_insert(local, &parents);
create_notes_commit(local_tree, parents, o->commit_msg.buf,
- result_sha1);
+ o->commit_msg.len, result_sha1);
}
found_result:
@@ -734,7 +734,7 @@ int notes_merge_commit(struct notes_merge_options *o,
}
create_notes_commit(partial_tree, partial_commit->parents, msg,
- result_sha1);
+ strlen(msg), result_sha1);
if (o->verbosity >= 4)
printf("Finalized notes merge commit: %s\n",
sha1_to_hex(result_sha1));
diff --git a/notes-merge.h b/notes-merge.h
index 168a672..fd52988 100644
--- a/notes-merge.h
+++ b/notes-merge.h
@@ -37,7 +37,7 @@ void init_notes_merge_options(struct notes_merge_options *o);
* The resulting commit SHA1 is stored in result_sha1.
*/
void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
- const char *msg, unsigned char *result_sha1);
+ const char *msg, size_t msg_len, unsigned char *result_sha1);
/*
* Merge notes from o->remote_ref into o->local_ref
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/3] merge: abort if fails to commit
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
@ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy
2011-12-14 18:13 ` Junio C Hamano
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2 siblings, 1 reply; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw)
To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/merge.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/builtin/merge.c b/builtin/merge.c
index df4548a..e57eefa 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -913,7 +913,9 @@ static int merge_trivial(struct commit *head)
parent->next->item = remoteheads->item;
parent->next->next = NULL;
prepare_to_commit();
- commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL);
+ if (commit_tree(merge_msg.buf, merge_msg.len,
+ result_tree, parent, result_commit, NULL))
+ die(_("failed to write commit object"));
finish(head, result_commit, "In-index merge");
drop_save();
return 0;
@@ -944,7 +946,9 @@ static int finish_automerge(struct commit *head,
strbuf_addch(&merge_msg, '\n');
prepare_to_commit();
free_commit_list(remoteheads);
- commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL);
+ if (commit_tree(merge_msg.buf, merge_msg.len,
+ result_tree, parents, result_commit, NULL))
+ die(_("failed to write commit object"));
strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
finish(head, result_commit, buf.buf);
strbuf_release(&buf);
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
@ 2011-12-14 14:08 ` Nguyễn Thái Ngọc Duy
2011-12-14 18:19 ` Junio C Hamano
2011-12-15 1:04 ` Miles Bader
2 siblings, 2 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-14 14:08 UTC (permalink / raw)
To: git; +Cc: Jeff King, Miles Bader, Nguyễn Thái Ngọc Duy
We assume that the commit log messages are uninterpreted sequences of
non-NUL bytes (see Documentation/i18n.txt). However the assumption
does not really stand out and it's quite easy to set an editor to save
in a NUL-included encoding. Currently we silently cut at the first NUL
we see.
Make it more obvious that NUL is not welcome by refusing to create
such commits. Those who deliberately want to create them can still do
with hash-object.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Documentation/config.txt | 4 ++++
advice.c | 2 ++
advice.h | 1 +
commit.c | 9 +++++++++
t/t3900-i18n-commit.sh | 6 ++++++
t/t3900/UTF-16.txt | Bin 0 -> 32 bytes
6 files changed, 22 insertions(+), 0 deletions(-)
create mode 100644 t/t3900/UTF-16.txt
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5a841da..daf57c2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -144,6 +144,10 @@ advice.*::
Advice shown when you used linkgit::git-checkout[1] to
move to the detach HEAD state, to instruct how to create
a local branch after the fact. Default: true.
+ commitWideEncoding::
+ Advice shown when linkgit::git-commit[1] refuses to
+ proceed because there are NULs in commit message.
+ Default: true.
--
core.fileMode::
diff --git a/advice.c b/advice.c
index e02e632..130949e 100644
--- a/advice.c
+++ b/advice.c
@@ -6,6 +6,7 @@ int advice_commit_before_merge = 1;
int advice_resolve_conflict = 1;
int advice_implicit_identity = 1;
int advice_detached_head = 1;
+int advice_commmit_wide_encoding = 1;
static struct {
const char *name;
@@ -17,6 +18,7 @@ static struct {
{ "resolveconflict", &advice_resolve_conflict },
{ "implicitidentity", &advice_implicit_identity },
{ "detachedhead", &advice_detached_head },
+ { "commitwideencoding", &advice_commmit_wide_encoding },
};
void advise(const char *advice, ...)
diff --git a/advice.h b/advice.h
index e5d0af7..d913bdb 100644
--- a/advice.h
+++ b/advice.h
@@ -9,6 +9,7 @@ extern int advice_commit_before_merge;
extern int advice_resolve_conflict;
extern int advice_implicit_identity;
extern int advice_detached_head;
+extern int advice_commmit_wide_encoding;
int git_default_advice_config(const char *var, const char *value);
void advise(const char *advice, ...);
diff --git a/commit.c b/commit.c
index d67b8c7..59e5bce 100644
--- a/commit.c
+++ b/commit.c
@@ -855,6 +855,15 @@ int commit_tree(const char *msg, size_t msg_len, unsigned char *tree,
assert_sha1_type(tree, OBJ_TREE);
+ if (memchr(msg, '\0', msg_len)) {
+ error(_("your commit message contains NUL characters."));
+ if (advice_commmit_wide_encoding) {
+ advise(_("This is often caused by using wide encodings such as"));
+ advise(_("UTF-16. Please check your editor settings."));
+ }
+ return -1;
+ }
+
/* Not having i18n.commitencoding is the same as having utf-8 */
encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 1f62c15..d48a7c0 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' '
test z = "z$E"
'
+test_expect_failure 'UTF-16 refused because of NULs' '
+ echo UTF-16 >F &&
+ git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt
+'
+
+
for H in ISO8859-1 eucJP ISO-2022-JP
do
test_expect_success "$H setup" '
diff --git a/t/t3900/UTF-16.txt b/t/t3900/UTF-16.txt
new file mode 100644
index 0000000000000000000000000000000000000000..53296be684253f40964c0604be7fa7ff12e200cb
GIT binary patch
literal 32
mcmezOpWz6@X@-jo=NYasZ~@^#h9rjP3@HpR7}6Nh8Mpw;r3yp<
literal 0
HcmV?d00001
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/3] Make commit_tree() take message length in addition to the commit message
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
@ 2011-12-14 18:12 ` Junio C Hamano
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-12-14 18:12 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Justification?
As all 3 primary users of this API feed strbuf.buf to the function, it
would make more sense to change the first parameter to a pointer to a
strbuf, no?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/3] merge: abort if fails to commit
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
@ 2011-12-14 18:13 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-12-14 18:13 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
> builtin/merge.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index df4548a..e57eefa 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -913,7 +913,9 @@ static int merge_trivial(struct commit *head)
> parent->next->item = remoteheads->item;
> parent->next->next = NULL;
> prepare_to_commit();
> - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL);
> + if (commit_tree(merge_msg.buf, merge_msg.len,
> + result_tree, parent, result_commit, NULL))
> + die(_("failed to write commit object"));
> finish(head, result_commit, "In-index merge");
> drop_save();
> return 0;
Should we die immediately, or should we do some clean-ups after ourselves
before doing so?
In any case, this is a good change that shouldn't be taken hostage to the
unrelated change in patch [1/3].
Thanks.
> @@ -944,7 +946,9 @@ static int finish_automerge(struct commit *head,
> strbuf_addch(&merge_msg, '\n');
> prepare_to_commit();
> free_commit_list(remoteheads);
> - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parents, result_commit, NULL);
> + if (commit_tree(merge_msg.buf, merge_msg.len,
> + result_tree, parents, result_commit, NULL))
> + die(_("failed to write commit object"));
> strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
> finish(head, result_commit, buf.buf);
> strbuf_release(&buf);
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
@ 2011-12-14 18:19 ` Junio C Hamano
2011-12-14 18:29 ` Jeff King
2011-12-15 1:04 ` Miles Bader
1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2011-12-14 18:19 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader
Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
> We assume that the commit log messages are uninterpreted sequences of
> non-NUL bytes (see Documentation/i18n.txt). However the assumption
> does not really stand out and it's quite easy to set an editor to save
> in a NUL-included encoding. Currently we silently cut at the first NUL
> we see.
>
> Make it more obvious that NUL is not welcome by refusing to create
> such commits. Those who deliberately want to create them can still do
> with hash-object.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Limiting the Porcelain layer to deal only with reasonable text encodings
(yes, I am declaring that utf16 is not among them) is perfectly fine, but
I was somehow hoping that you would allow the option for the low-level
function commit_tree() to create a commit object with binary blob in the
body part, especially after seeing the patch 1/3 to do so.
Certainly that kind of usage would not give the binary blob literally in
"git log" output, but it is with or without the issue around NUL byte. A
custom program linked with commit.c to call commit_tree() may not be using
the data structure to store anything that is meant to be read by "git log"
to begin with.
Not a strong veto at all, just throwing out something to think about.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-14 18:19 ` Junio C Hamano
@ 2011-12-14 18:29 ` Jeff King
2011-12-15 18:46 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-12-14 18:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyễn Thái Ngọc Duy, git, Miles Bader
On Wed, Dec 14, 2011 at 10:19:16AM -0800, Junio C Hamano wrote:
> Limiting the Porcelain layer to deal only with reasonable text encodings
> (yes, I am declaring that utf16 is not among them) is perfectly fine, but
> I was somehow hoping that you would allow the option for the low-level
> function commit_tree() to create a commit object with binary blob in the
> body part, especially after seeing the patch 1/3 to do so.
>
> Certainly that kind of usage would not give the binary blob literally in
> "git log" output, but it is with or without the issue around NUL byte. A
> custom program linked with commit.c to call commit_tree() may not be using
> the data structure to store anything that is meant to be read by "git log"
> to begin with.
I'm happy to ignore custom programs linking against internal git code,
but what should "git commit-tree" do?
My gut feeling is that it should store the literal binary contents.
However, I don't think this has ever been the case. Even in the initial
version of commit-tree.c, we read the input line-by-line and sprintf it
into place.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-14 18:19 ` Junio C Hamano
@ 2011-12-15 1:04 ` Miles Bader
2011-12-15 1:18 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Miles Bader @ 2011-12-15 1:04 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King
> + commitWideEncoding::
> + Advice shown when linkgit::git-commit[1] refuses to
> + proceed because there are NULs in commit message.
> + Default: true.
Although "wide encoding" is a reasonable guess at cause of embedded
zero characters (and so a useful term for diagnostic messages, as it
can help users identify the problem in their environment which is
causing such zero bytes), it's really only a guess in most cases...
Shouldn't the variable be named based on what it actually does, which
is allow zero-bytes in commit messages...?
-Miles
--
Cat is power. Cat is peace.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-15 1:04 ` Miles Bader
@ 2011-12-15 1:18 ` Jeff King
2011-12-15 3:09 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2011-12-15 1:18 UTC (permalink / raw)
To: Miles Bader; +Cc: Nguyễn Thái Ngọc Duy, git
On Thu, Dec 15, 2011 at 10:04:06AM +0900, Miles Bader wrote:
> > + commitWideEncoding::
> > + Advice shown when linkgit::git-commit[1] refuses to
> > + proceed because there are NULs in commit message.
> > + Default: true.
>
> Although "wide encoding" is a reasonable guess at cause of embedded
> zero characters (and so a useful term for diagnostic messages, as it
> can help users identify the problem in their environment which is
> causing such zero bytes), it's really only a guess in most cases...
>
> Shouldn't the variable be named based on what it actually does, which
> is allow zero-bytes in commit messages...?
I agree, but...
Really this variable is overkill. The advice.* subsystem is for
silencing hints and warnings from git that you see repeatedly because
you are smarter than git, and want to ignore its advice.
But in this case, I don't see a user saying "stupid git, of _course_ I
want to commit NULs. Stop nagging me". Especially because it is not a
warning, but a fatal error. :)
So yes, it's verbose, but no, it's not something somebody is going to be
so bothered by that they will find the config option to turn it off.
Instead, they will stop doing the bad thing and never see it again. At
best this config option is useless, and at worst it clutters the
advice.* namespace, making it harder for people to find the advice
option they _do_ want to turn off).
Perhaps it should just be dropped.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-15 1:18 ` Jeff King
@ 2011-12-15 3:09 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-12-15 3:09 UTC (permalink / raw)
To: Jeff King; +Cc: Miles Bader, Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> Perhaps it should just be dropped.
Thanks. I have nothing to add---you said everything that needs to be said.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v2 1/3] merge: abort if fails to commit
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 18:12 ` Junio C Hamano
@ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw)
To: git
Cc: Jeff King, Miles Bader, Junio C Hamano,
Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
2011/12/15 Junio C Hamano <gitster@pobox.com>:
>> - commit_tree(merge_msg.buf, merge_msg.len, result_tree, parent, result_commit, NULL);
>> + if (commit_tree(merge_msg.buf, merge_msg.len,
>> + result_tree, parent, result_commit, NULL))
>> + die(_("failed to write commit object"));
>> finish(head, result_commit, "In-index merge");
>> drop_save();
>> return 0;
>
> Should we die immediately, or should we do some clean-ups after ourselves
> before doing so?
I'm not sure. I had a quick look over the command and it seems we do
not need to do any clean-ups. But I'm not familiar with the command
anyway..
> In any case, this is a good change that shouldn't be taken hostage to the
> unrelated change in patch [1/3].
Moved it up so it you can cherry-pick it independently.
builtin/merge.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/builtin/merge.c b/builtin/merge.c
index 2870a6a..27576c0 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -913,7 +913,8 @@ static int merge_trivial(struct commit *head)
parent->next->item = remoteheads->item;
parent->next->next = NULL;
prepare_to_commit();
- commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL);
+ if (commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL))
+ die(_("failed to write commit object"));
finish(head, result_commit, "In-index merge");
drop_save();
return 0;
@@ -944,7 +945,8 @@ static int finish_automerge(struct commit *head,
strbuf_addch(&merge_msg, '\n');
prepare_to_commit();
free_commit_list(remoteheads);
- commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL);
+ if (commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL))
+ die(_("failed to write commit object"));
strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
finish(head, result_commit, buf.buf);
strbuf_release(&buf);
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 2/3] Convert commit_tree() to take strbuf as message
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
@ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2 siblings, 0 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw)
To: git
Cc: Jeff King, Miles Bader, Junio C Hamano,
Nguyễn Thái Ngọc Duy
Because strbuf provides message length, we can create commits that
include NULs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
builtin/commit-tree.c | 2 +-
builtin/commit.c | 2 +-
builtin/merge.c | 4 ++--
builtin/notes.c | 4 ++--
commit.c | 4 ++--
commit.h | 2 +-
notes-cache.c | 5 ++++-
notes-merge.c | 10 ++++++----
notes-merge.h | 2 +-
9 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index d083795..0895861 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -56,7 +56,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
if (strbuf_read(&buffer, 0, 0) < 0)
die_errno("git commit-tree: failed to read");
- if (commit_tree(buffer.buf, tree_sha1, parents, commit_sha1, NULL)) {
+ if (commit_tree(&buffer, tree_sha1, parents, commit_sha1, NULL)) {
strbuf_release(&buffer);
return 1;
}
diff --git a/builtin/commit.c b/builtin/commit.c
index 8f2bebe..849151e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1483,7 +1483,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
exit(1);
}
- if (commit_tree(sb.buf, active_cache_tree->sha1, parents, sha1,
+ if (commit_tree(&sb, active_cache_tree->sha1, parents, sha1,
author_ident.buf)) {
rollback_index_files();
die(_("failed to write commit object"));
diff --git a/builtin/merge.c b/builtin/merge.c
index 27576c0..e066bf1 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -913,7 +913,7 @@ static int merge_trivial(struct commit *head)
parent->next->item = remoteheads->item;
parent->next->next = NULL;
prepare_to_commit();
- if (commit_tree(merge_msg.buf, result_tree, parent, result_commit, NULL))
+ if (commit_tree(&merge_msg, result_tree, parent, result_commit, NULL))
die(_("failed to write commit object"));
finish(head, result_commit, "In-index merge");
drop_save();
@@ -945,7 +945,7 @@ static int finish_automerge(struct commit *head,
strbuf_addch(&merge_msg, '\n');
prepare_to_commit();
free_commit_list(remoteheads);
- if (commit_tree(merge_msg.buf, result_tree, parents, result_commit, NULL))
+ if (commit_tree(&merge_msg, result_tree, parents, result_commit, NULL))
die(_("failed to write commit object"));
strbuf_addf(&buf, "Merge made by the '%s' strategy.", wt_strategy);
finish(head, result_commit, buf.buf);
diff --git a/builtin/notes.c b/builtin/notes.c
index f8e437d..5e32548 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -301,12 +301,12 @@ void commit_notes(struct notes_tree *t, const char *msg)
return; /* don't have to commit an unchanged tree */
/* Prepare commit message and reflog message */
- strbuf_addstr(&buf, "notes: "); /* commit message starts at index 7 */
strbuf_addstr(&buf, msg);
if (buf.buf[buf.len - 1] != '\n')
strbuf_addch(&buf, '\n'); /* Make sure msg ends with newline */
- create_notes_commit(t, NULL, buf.buf + 7, commit_sha1);
+ create_notes_commit(t, NULL, &buf, commit_sha1);
+ strbuf_insert(&buf, 0, "notes: ", 7); /* commit message starts at index 7 */
update_ref(buf.buf, t->ref, commit_sha1, NULL, 0, DIE_ON_ERR);
strbuf_release(&buf);
diff --git a/commit.c b/commit.c
index 73b7e00..0a214a6 100644
--- a/commit.c
+++ b/commit.c
@@ -845,7 +845,7 @@ static const char commit_utf8_warn[] =
"You may want to amend it after fixing the message, or set the config\n"
"variable i18n.commitencoding to the encoding your project uses.\n";
-int commit_tree(const char *msg, unsigned char *tree,
+int commit_tree(const struct strbuf *msg, unsigned char *tree,
struct commit_list *parents, unsigned char *ret,
const char *author)
{
@@ -884,7 +884,7 @@ int commit_tree(const char *msg, unsigned char *tree,
strbuf_addch(&buffer, '\n');
/* And add the comment */
- strbuf_addstr(&buffer, msg);
+ strbuf_addbuf(&buffer, msg);
/* And check the encoding */
if (encoding_is_utf8 && !is_utf8(buffer.buf))
diff --git a/commit.h b/commit.h
index 009b113..5cf46b2 100644
--- a/commit.h
+++ b/commit.h
@@ -181,7 +181,7 @@ static inline int single_parent(struct commit *commit)
struct commit_list *reduce_heads(struct commit_list *heads);
-extern int commit_tree(const char *msg, unsigned char *tree,
+extern int commit_tree(const struct strbuf *msg, unsigned char *tree,
struct commit_list *parents, unsigned char *ret,
const char *author);
diff --git a/notes-cache.c b/notes-cache.c
index 4c8984e..bea013e 100644
--- a/notes-cache.c
+++ b/notes-cache.c
@@ -48,6 +48,7 @@ int notes_cache_write(struct notes_cache *c)
{
unsigned char tree_sha1[20];
unsigned char commit_sha1[20];
+ struct strbuf msg = STRBUF_INIT;
if (!c || !c->tree.initialized || !c->tree.ref || !*c->tree.ref)
return -1;
@@ -56,7 +57,9 @@ int notes_cache_write(struct notes_cache *c)
if (write_notes_tree(&c->tree, tree_sha1))
return -1;
- if (commit_tree(c->validity, tree_sha1, NULL, commit_sha1, NULL) < 0)
+ strbuf_attach(&msg, c->validity,
+ strlen(c->validity), strlen(c->validity) + 1);
+ if (commit_tree(&msg, tree_sha1, NULL, commit_sha1, NULL) < 0)
return -1;
if (update_ref("update notes cache", c->tree.ref, commit_sha1, NULL,
0, QUIET_ON_ERR) < 0)
diff --git a/notes-merge.c b/notes-merge.c
index ce10aac..b5a36ac 100644
--- a/notes-merge.c
+++ b/notes-merge.c
@@ -530,7 +530,7 @@ static int merge_from_diffs(struct notes_merge_options *o,
}
void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
- const char *msg, unsigned char *result_sha1)
+ const struct strbuf *msg, unsigned char *result_sha1)
{
unsigned char tree_sha1[20];
@@ -668,7 +668,7 @@ int notes_merge(struct notes_merge_options *o,
struct commit_list *parents = NULL;
commit_list_insert(remote, &parents); /* LIFO order */
commit_list_insert(local, &parents);
- create_notes_commit(local_tree, parents, o->commit_msg.buf,
+ create_notes_commit(local_tree, parents, &o->commit_msg,
result_sha1);
}
@@ -695,7 +695,8 @@ int notes_merge_commit(struct notes_merge_options *o,
struct dir_struct dir;
char *path = xstrdup(git_path(NOTES_MERGE_WORKTREE "/"));
int path_len = strlen(path), i;
- const char *msg = strstr(partial_commit->buffer, "\n\n");
+ char *msg = strstr(partial_commit->buffer, "\n\n");
+ struct strbuf sb_msg = STRBUF_INIT;
if (o->verbosity >= 3)
printf("Committing notes in notes merge worktree at %.*s\n",
@@ -733,7 +734,8 @@ int notes_merge_commit(struct notes_merge_options *o,
sha1_to_hex(obj_sha1), sha1_to_hex(blob_sha1));
}
- create_notes_commit(partial_tree, partial_commit->parents, msg,
+ strbuf_attach(&sb_msg, msg, strlen(msg), strlen(msg) + 1);
+ create_notes_commit(partial_tree, partial_commit->parents, &sb_msg,
result_sha1);
if (o->verbosity >= 4)
printf("Finalized notes merge commit: %s\n",
diff --git a/notes-merge.h b/notes-merge.h
index 168a672..0c11b17 100644
--- a/notes-merge.h
+++ b/notes-merge.h
@@ -37,7 +37,7 @@ void init_notes_merge_options(struct notes_merge_options *o);
* The resulting commit SHA1 is stored in result_sha1.
*/
void create_notes_commit(struct notes_tree *t, struct commit_list *parents,
- const char *msg, unsigned char *result_sha1);
+ const struct strbuf *msg, unsigned char *result_sha1);
/*
* Merge notes from o->remote_ref into o->local_ref
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v2 3/3] commit: refuse commit messages that contain NULs
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
@ 2011-12-15 13:47 ` Nguyễn Thái Ngọc Duy
2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2 siblings, 0 replies; 23+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2011-12-15 13:47 UTC (permalink / raw)
To: git
Cc: Jeff King, Miles Bader, Junio C Hamano,
Nguyễn Thái Ngọc Duy
Current implementation sees NUL as terminator. If users give a
NUL-included message (e.g. editor accidentally set to save as UTF-16),
the new commit message will have NULs. However following operations
(displaying or amending a commit for example) will not show anything
after the first NUL.
Stop user right when they do this. If NUL is added by mistake, they
have their chance to fix. If they know that they are doing,
commit-tree will gladly commit whatever is given.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
Advice stuff dropped. I realized quite late that commit_tree() is
also used for plumbing commands (also thanks to Junio's comments),
while I wanted to check at porcelain level only. So I moved the check
up to builtin/commit.c. If we need the same check for other commands,
which I doubt, similar checks can be added.
builtin/commit.c | 7 +++++++
t/t3900-i18n-commit.sh | 6 ++++++
2 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index 849151e..5db7673 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1483,6 +1483,13 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
exit(1);
}
+ if (memchr(sb.buf, '\0', sb.len)) {
+ rollback_index_files();
+ die(_("your commit message contains NUL characters.\n"
+ "hint: This is often caused by using wide encodings such as\n"
+ "hint: UTF-16. Please check your editor settings."));
+ }
+
if (commit_tree(&sb, active_cache_tree->sha1, parents, sha1,
author_ident.buf)) {
rollback_index_files();
diff --git a/t/t3900-i18n-commit.sh b/t/t3900-i18n-commit.sh
index 1f62c15..d48a7c0 100755
--- a/t/t3900-i18n-commit.sh
+++ b/t/t3900-i18n-commit.sh
@@ -34,6 +34,12 @@ test_expect_success 'no encoding header for base case' '
test z = "z$E"
'
+test_expect_failure 'UTF-16 refused because of NULs' '
+ echo UTF-16 >F &&
+ git commit -a -F "$TEST_DIRECTORY"/t3900/UTF-16.txt
+'
+
+
for H in ISO8859-1 eucJP ISO-2022-JP
do
test_expect_success "$H setup" '
--
1.7.8.36.g69ee2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v2 1/3] merge: abort if fails to commit
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
@ 2011-12-15 18:23 ` Junio C Hamano
2 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-12-15 18:23 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git, Jeff King, Miles Bader
Thanks.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-14 18:29 ` Jeff King
@ 2011-12-15 18:46 ` Junio C Hamano
2011-12-15 19:35 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2011-12-15 18:46 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git, Miles Bader
Jeff King <peff@peff.net> writes:
> I'm happy to ignore custom programs linking against internal git code,
> but what should "git commit-tree" do?
>
> My gut feeling is that it should store the literal binary contents.
> However, I don't think this has ever been the case. Even in the initial
> version of commit-tree.c, we read the input line-by-line and sprintf it
> into place.
Yeah, you are right. Perhaps we should tweak updated 3/3 to check at the
lower level commit_tree() then.
I've rewrote the log message for 2/3 as follows so we can go either way
;-)
Convert commit_tree() to take strbuf as message
There wan't a way for commit_tree() to notice if the message the caller
prepared contained a NUL byte, as it did not take the length of the
message as a parameter. Use a pointer to a strbuf instead, so that we can
either choose to allow low-level plumbing commands to make commits
that contain NUL byte in its message, or forbid NUL everywhere by
adding the check in commit_tree(), in later patches.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/3] Do not create commits whose message contains NUL
2011-12-15 18:46 ` Junio C Hamano
@ 2011-12-15 19:35 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2011-12-15 19:35 UTC (permalink / raw)
To: git; +Cc: Jeff King, Nguyễn Thái Ngọc Duy, Miles Bader
Junio C Hamano <gitster@pobox.com> writes:
>> My gut feeling is that it should store the literal binary contents.
>> However, I don't think this has ever been the case. Even in the initial
>> version of commit-tree.c, we read the input line-by-line and sprintf it
>> into place.
>
> Yeah, you are right. Perhaps we should tweak updated 3/3 to check at the
> lower level commit_tree() then.
>
> I've rewrote the log message for 2/3 as follows so we can go either way
> ;-)
s/rewrote/rewritten/ obviously...
> Convert commit_tree() to take strbuf as message
>
> There wan't a way for commit_tree() to notice if the message the caller
> prepared contained a NUL byte, as it did not take the length of the
> message as a parameter. Use a pointer to a strbuf instead, so that we can
> either choose to allow low-level plumbing commands to make commits
> that contain NUL byte in its message, or forbid NUL everywhere by
> adding the check in commit_tree(), in later patches.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
And 3/3 looks like this:
commit_tree(): refuse commit messages that contain NULs
Current implementation sees NUL as terminator. If users give a message
with NUL byte in it (e.g. editor set to save as UTF-16), the new commit
message will have NULs. However following operations (displaying or
amending a commit for example) will not keep anything after the first NUL.
Stop user right when they do this. If NUL is added by mistake, they have
their chance to fix. Otherwise, log messages will no longer be text "git
log" and friends would grok.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL
2011-12-13 17:59 ` Jeff King
2011-12-14 5:23 ` Miles Bader
@ 2012-01-01 16:27 ` Drew Northup
2012-01-03 20:03 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Drew Northup @ 2012-01-01 16:27 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
On Tue, 2011-12-13 at 12:59 -0500, Jeff King wrote:
> It looks like we already have a check for is_utf8, and this is not
> failing that check. I guess because is_utf8 takes a NUL-terminated
> buffer, so it simply sees the truncated result (i.e., depending on
> endianness, "foo" in utf16 is something like "f\0o\0o\0", so we check
> only "f"). We could make is_utf8 take a length parameter to be more
> accurate, and then it would catch this.
>
> However, I think that's not quite what we want. We only check is_utf8 if
> the encoding field is not set. And really, we want to reject NULs no
> matter _which_ encoding they've set, because git simply doesn't handle
> them properly.
I had already started experimenting with automatically detecting decent
UTF-16 a long while back so that compatible platforms could handle it
appropriately in terms of creating diffs and dealing with newline
munging between platforms. There is no 100% sure-fire check for UTF-16
if you don't already suspect it is possibly UTF-16. If we really want to
check for possible UTF-16 specifically I can scrape out the check I
wrote up and send it along.
The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It
seems to me that it was written as part of the "is this a binary or not"
check in the add/commit path. I have thought for some time that
specifying buffer length in that whole code path would be a good idea
(but I thought that somebody else had taken up that battle while I was
busy dealing with other problems elsewhere), if for no other reason it
would force it to deal with NULs more intelligently.
--
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH resend] Do not create commits whose message contains NUL
2012-01-01 16:27 ` Drew Northup
@ 2012-01-03 20:03 ` Jeff King
0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2012-01-03 20:03 UTC (permalink / raw)
To: Drew Northup; +Cc: Nguyễn Thái Ngọc Duy, git
On Sun, Jan 01, 2012 at 11:27:31AM -0500, Drew Northup wrote:
> I had already started experimenting with automatically detecting decent
> UTF-16 a long while back so that compatible platforms could handle it
> appropriately in terms of creating diffs and dealing with newline
> munging between platforms. There is no 100% sure-fire check for UTF-16
> if you don't already suspect it is possibly UTF-16. If we really want to
> check for possible UTF-16 specifically I can scrape out the check I
> wrote up and send it along.
I also looked into this recently. You can generally detect UTF-16 by the
BOM at the beginning of the file (which will also tell you the
endian-ness). I did a simple test by integrating it into the check for
binary-ness during diffs. However, as I recall, the result wasn't
particularly useful. Some of the diff code wasn't happy with the
embedded NUL bytes (i.e., there is code that assumes that NUL is the end
of a string). Not to mention that ascii newline (0x0a) can appear as
part of other characters in a wide encoding like utf-16. And since git
outputs straight ascii for all of the diff boilerplate, you end up with
a mish-mash of utf-16 and ascii (this is OK with utf-8, of course,
because utf-8 is a superset of ascii).
If anything, I think you would want to do something like "textconv" to
convert the utf-16 into utf-8, then diff that. Git won't do it
automatically based on encoding, but if you know the filenames of the
utf-16 files in your repository, you can do something like:
echo 'foo.txt diff=utf16' >.gitattributes
git config diff.utf16.textconv 'iconv -f utf16 -t utf8'
and get readable diffs. Of course you couldn't use that diff to apply a
patch, though.
I strongly suspect that not many people are really using git for utf-16
files. Git treats them as binary, which makes them unpleasant for
anything except simple storage.
> The is_utf8 check was not written to detect 100% valid UTF-8 per-se. It
> seems to me that it was written as part of the "is this a binary or not"
> check in the add/commit path.
We shouldn't care about binary file content at all in the add or commit
code paths. I would guess we do only if you are using auto-crlf (but
then, I don't think we care about utf8 in that cases, only whether line
endings should be converted or not).
We do check that the commit message itself is utf8, but only to generate
a warning that you should set i81n.commitencoding.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-01-03 20:03 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-13 11:56 [PATCH resend] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-13 17:59 ` Jeff King
2011-12-14 5:23 ` Miles Bader
2011-12-14 7:17 ` Jeff King
2012-01-01 16:27 ` Drew Northup
2012-01-03 20:03 ` Jeff King
2011-12-14 14:08 ` [PATCH 0/3] git-commit rejects messages with NULs Nguyễn Thái Ngọc Duy
2011-12-14 14:08 ` [PATCH 1/3] Make commit_tree() take message length in addition to the commit message Nguyễn Thái Ngọc Duy
2011-12-14 18:12 ` Junio C Hamano
2011-12-15 13:47 ` [PATCH v2 1/3] merge: abort if fails to commit Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 2/3] Convert commit_tree() to take strbuf as message Nguyễn Thái Ngọc Duy
2011-12-15 13:47 ` [PATCH v2 3/3] commit: refuse commit messages that contain NULs Nguyễn Thái Ngọc Duy
2011-12-15 18:23 ` [PATCH v2 1/3] merge: abort if fails to commit Junio C Hamano
2011-12-14 14:08 ` [PATCH 2/3] " Nguyễn Thái Ngọc Duy
2011-12-14 18:13 ` Junio C Hamano
2011-12-14 14:08 ` [PATCH 3/3] Do not create commits whose message contains NUL Nguyễn Thái Ngọc Duy
2011-12-14 18:19 ` Junio C Hamano
2011-12-14 18:29 ` Jeff King
2011-12-15 18:46 ` Junio C Hamano
2011-12-15 19:35 ` Junio C Hamano
2011-12-15 1:04 ` Miles Bader
2011-12-15 1:18 ` Jeff King
2011-12-15 3:09 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).