* git-mailinfo '-u' argument should be default.
@ 2006-05-12 16:46 David Woodhouse
2007-01-09 14:03 ` [PATCH] " David Woodhouse
0 siblings, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2006-05-12 16:46 UTC (permalink / raw)
To: git
If you apply a patch with 'git-am', it takes the raw content of the
From: header, in whatever character set that happens to be, and puts
that content, untagged, into the commit object.
That is almost _never_ the right thing to do, surely? Raw data in
untagged character sets is marginally better than line noise.
We should default to the '-u' behaviour, which converts to the character
set which the git repo is stored in. It's not just for converting to
UTF-8, although it looks like it was in the past. Now, however, it
converts to the character set defined in the configuration. So even if
it's a Luddite who for some reason is sticking with an obsolete
character set, it gets that right.
The only option which makes sense _other_ than that would be to just
stick the RFC2047-encoded original From: header into the commit, surely?
--
dwmw2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] Re: git-mailinfo '-u' argument should be default.
2006-05-12 16:46 git-mailinfo '-u' argument should be default David Woodhouse
@ 2007-01-09 14:03 ` David Woodhouse
2007-01-09 14:08 ` Johannes Schindelin
2007-01-09 18:46 ` Junio C Hamano
0 siblings, 2 replies; 8+ messages in thread
From: David Woodhouse @ 2007-01-09 14:03 UTC (permalink / raw)
To: git; +Cc: Dave Jones, Junio C Hamano, chrisw
On Fri, 2006-05-12 at 17:46 +0100, David Woodhouse wrote:
> If you apply a patch with 'git-am', it takes the raw content of the
> From: header, in whatever character set that happens to be, and puts
> that content, untagged, into the commit object.
>
> That is almost _never_ the right thing to do, surely? Raw data in
> untagged character sets is marginally better than line noise.
>
> We should default to the '-u' behaviour, which converts to the character
> set which the git repo is stored in. It's not just for converting to
> UTF-8, although it looks like it was in the past. Now, however, it
> converts to the character set defined in the configuration. So even if
> it's a Luddite who for some reason is sticking with an obsolete
> character set, it gets that right.
This patch:
1. Fixes the default not to throw away the MIME information.
2. Adds a '-n' option with the old behaviour, although I can't
actually imagine why someone might find that desirable.
3. Aborts if the conversion fails, allowing the user to fix it
rather than silently corrupting the input. There's always the
new '-n' option if the user _really_ wants it corrupted. :)
diff --git a/Documentation/git-mailinfo.txt b/Documentation/git-mailinfo.txt
index ea0a065..97b6c5b 100644
--- a/Documentation/git-mailinfo.txt
+++ b/Documentation/git-mailinfo.txt
@@ -8,7 +8,7 @@ git-mailinfo - Extracts patch from a single e-mail message
SYNOPSIS
--------
-'git-mailinfo' [-k] [-u | --encoding=<encoding>] <msg> <patch>
+'git-mailinfo' [-k] [-n | --encoding=<encoding>] <msg> <patch>
DESCRIPTION
@@ -34,19 +34,20 @@ OPTIONS
-u::
By default, the commit log message, author name and
- author email are taken from the e-mail without any
- charset conversion, after minimally decoding MIME
- transfer encoding. This flag causes the resulting
- commit to be encoded in the encoding specified by
- i18n.commitencoding configuration (defaults to utf-8) by
- transliterating them.
+ author email are transliterated if necessary in order
+ to store them in the correct encoding as specified by
+ the i18n.commitencoding configuration option (which
+ defaults to UTF-8). This flag disables such character
+ set conversion; performing basic MIME transfer decoding
+ but then intentionally discarding the character set
+ information and using the raw byte sequences.
Note that the patch is always used as is without charset
- conversion, even with this flag.
+ conversion, even without this flag.
--encoding=<encoding>::
- Similar to -u but if the local convention is different
- from what is specified by i18n.commitencoding, this flag
- can be used to override it.
+ If the i18n.commitencoding configuration option is set
+ set incorrectly (or unset when the UTF-8 default is not
+ desired), this flag can be used to override it.
<msg>::
The commit log message extracted from e-mail, usually
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index a67f3eb..02bd183 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -518,8 +518,7 @@ static void convert_to_utf8(char *line, char *charset)
if (!out) {
fprintf(stderr, "cannot convert from %s to %s\n",
input_charset, metainfo_charset);
- *charset = 0;
- return;
+ exit(1);
}
strcpy(line, out);
free(out);
@@ -793,7 +792,7 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
}
static const char mailinfo_usage[] =
- "git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info";
+ "git-mailinfo [-k] [-n | --encoding=<encoding>] msg patch <mail >info";
int cmd_mailinfo(int argc, const char **argv, const char *prefix)
{
@@ -802,12 +801,16 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
*/
git_config(git_default_config);
+ metainfo_charset = (git_commit_encoding
+ ? git_commit_encoding : "utf-8");
+
while (1 < argc && argv[1][0] == '-') {
if (!strcmp(argv[1], "-k"))
keep_subject = 1;
- else if (!strcmp(argv[1], "-u"))
- metainfo_charset = (git_commit_encoding
- ? git_commit_encoding : "utf-8");
+ else if (!strcmp(argv[1], "-u"))
+ ; /* this is now default behaviour */
+ else if (!strcmp(argv[1], "-n"))
+ metainfo_charset = NULL;
else if (!strncmp(argv[1], "--encoding=", 11))
metainfo_charset = argv[1] + 11;
else
--
dwmw2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-09 14:03 ` [PATCH] " David Woodhouse
@ 2007-01-09 14:08 ` Johannes Schindelin
2007-01-09 14:21 ` David Woodhouse
2007-01-09 18:46 ` Junio C Hamano
1 sibling, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2007-01-09 14:08 UTC (permalink / raw)
To: David Woodhouse; +Cc: git, Dave Jones, Junio C Hamano, chrisw
Hi,
On Tue, 9 Jan 2007, David Woodhouse wrote:
in the Documentation,
> -u::
needs to be replaced by "-n::", and
> + If the i18n.commitencoding configuration option is set
> + set incorrectly (or unset when the UTF-8 default is not
needs only one "set", not two.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-09 14:08 ` Johannes Schindelin
@ 2007-01-09 14:21 ` David Woodhouse
0 siblings, 0 replies; 8+ messages in thread
From: David Woodhouse @ 2007-01-09 14:21 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Dave Jones, Junio C Hamano, chrisw
On Tue, 2007-01-09 at 15:08 +0100, Johannes Schindelin wrote:
> Hi,
>
> On Tue, 9 Jan 2007, David Woodhouse wrote:
>
> in the Documentation,
>
> > -u::
>
> needs to be replaced by "-n::", and
>
> > + If the i18n.commitencoding configuration option is set
> > + set incorrectly (or unset when the UTF-8 default is not
>
> needs only one "set", not two.
Oops; thanks.
diff --git a/Documentation/git-mailinfo.txt b/Documentation/git-mailinfo.txt
index ea0a065..e267053 100644
--- a/Documentation/git-mailinfo.txt
+++ b/Documentation/git-mailinfo.txt
@@ -8,7 +8,7 @@ git-mailinfo - Extracts patch from a single e-mail message
SYNOPSIS
--------
-'git-mailinfo' [-k] [-u | --encoding=<encoding>] <msg> <patch>
+'git-mailinfo' [-k] [-n | --encoding=<encoding>] <msg> <patch>
DESCRIPTION
@@ -32,21 +32,22 @@ OPTIONS
munging, and is most useful when used to read back 'git
format-patch --mbox' output.
--u::
+-n::
By default, the commit log message, author name and
- author email are taken from the e-mail without any
- charset conversion, after minimally decoding MIME
- transfer encoding. This flag causes the resulting
- commit to be encoded in the encoding specified by
- i18n.commitencoding configuration (defaults to utf-8) by
- transliterating them.
+ author email are transliterated if necessary in order
+ to store them in the correct encoding as specified by
+ the i18n.commitencoding configuration option (which
+ defaults to UTF-8). This flag disables such character
+ set conversion; performing basic MIME transfer decoding
+ but then intentionally discarding the character set
+ information and using the raw byte sequences.
Note that the patch is always used as is without charset
- conversion, even with this flag.
+ conversion, even without this flag.
--encoding=<encoding>::
- Similar to -u but if the local convention is different
- from what is specified by i18n.commitencoding, this flag
- can be used to override it.
+ If the i18n.commitencoding configuration option is set
+ incorrectly (or unset when the UTF-8 default is not
+ desired), this flag can be used to override it.
<msg>::
The commit log message extracted from e-mail, usually
diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index a67f3eb..02bd183 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -518,8 +518,7 @@ static void convert_to_utf8(char *line, char *charset)
if (!out) {
fprintf(stderr, "cannot convert from %s to %s\n",
input_charset, metainfo_charset);
- *charset = 0;
- return;
+ exit(1);
}
strcpy(line, out);
free(out);
@@ -793,7 +792,7 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
}
static const char mailinfo_usage[] =
- "git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info";
+ "git-mailinfo [-k] [-n | --encoding=<encoding>] msg patch <mail >info";
int cmd_mailinfo(int argc, const char **argv, const char *prefix)
{
@@ -802,12 +801,16 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
*/
git_config(git_default_config);
+ metainfo_charset = (git_commit_encoding
+ ? git_commit_encoding : "utf-8");
+
while (1 < argc && argv[1][0] == '-') {
if (!strcmp(argv[1], "-k"))
keep_subject = 1;
- else if (!strcmp(argv[1], "-u"))
- metainfo_charset = (git_commit_encoding
- ? git_commit_encoding : "utf-8");
+ else if (!strcmp(argv[1], "-u"))
+ ; /* this is now default behaviour */
+ else if (!strcmp(argv[1], "-n"))
+ metainfo_charset = NULL;
else if (!strncmp(argv[1], "--encoding=", 11))
metainfo_charset = argv[1] + 11;
else
--
dwmw2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-09 14:03 ` [PATCH] " David Woodhouse
2007-01-09 14:08 ` Johannes Schindelin
@ 2007-01-09 18:46 ` Junio C Hamano
2007-01-09 23:49 ` David Woodhouse
1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-01-09 18:46 UTC (permalink / raw)
To: David Woodhouse; +Cc: git
David Woodhouse <dwmw2@infradead.org> writes:
> On Fri, 2006-05-12 at 17:46 +0100, David Woodhouse wrote:
>> ...
> This patch:
> 1. Fixes the default not to throw away the MIME information.
> 2. Adds a '-n' option with the old behaviour, although I can't
> actually imagine why someone might find that desirable.
> 3. Aborts if the conversion fails, allowing the user to fix it
> rather than silently corrupting the input. There's always the
> new '-n' option if the user _really_ wants it corrupted. :)
Thanks.
Documentation/SubmittingPatches, and Sign-off?
I do not think you would want to make '-n' in the third point
sound so negative and make people on projects that chose to use
legacy encoding for whatever reasons feel _dirty_. If the
natural language in project's log is limited and a legacy
encoding is sufficient, and if all the participants agree on a
legacy encoding to use because tools other than git they need to
use are more convenient with the legacy encoding rather than
UTF-8, there is no need to give a lecture to them saying they
should switch to UTF-8 and/or what they have been doing is
sub-par -- it isn't.
If the command allows straight-through (and I think it should)
but now defaults to UTF-8, you also need to update the existing
Porcelain-level tools (i.e. callers of mailinfo) so that they
pass -n when the end-user says "I want straight-through"; asking
for UTF-8 can be done by either not passing anything or
explicitly passing -u, but the point is that the callers need to
be changed anyway. And at that point, the default of mailinfo
does not matter that much -- although it is good for consistency
to make it also default to UTF-8.
I've updated git-am yesterday to default to --utf8 (which is
overridable with --no-utf8) but did not touch mailinfo during
that process; it further needs to be told about the -n option.
I haven't touched git-applymbox yet but it should also be taught
about the new default and the override.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-09 18:46 ` Junio C Hamano
@ 2007-01-09 23:49 ` David Woodhouse
2007-01-10 0:55 ` Junio C Hamano
0 siblings, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2007-01-09 23:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote:
> I do not think you would want to make '-n' in the third point
> sound so negative
No, I really _do_ want that.
> and make people on projects that chose to use
> legacy encoding for whatever reasons feel _dirty_.
... but not that, because it wasn't aimed at them.
> If the natural language in project's log is limited and a legacy
> encoding is sufficient, and if all the participants agree on a
> legacy encoding to use...
(...for git's own purely internal storage format).
That's not the use case for the -n option. Their case is what the
i18n.commitencoding configuration option exists for.
Although having said that, I don't actually know _why_ we let them
override the default, since it's _internal_ to git. As long as git
itself is correctly doing the conversion on the way in and out, there's
no reason for them to care whether we use UTF-8, UCS-4, EBCDIC or some
other arbitrary encoding (as long as our encoding can represent anything
they choose to throw at us).
> because tools other than git they need to
> use are more convenient with the legacy encoding rather than
> UTF-8,
That makes about as much sense to me as letting them configure git to
store objects uncompressed "because tools other than git are more
convenient without compression".
If our choice of _internal_ storage affects their other tools, then
either they're doing something very strange like poking at git objects
directly, or there's a bug in the git tools.
> there is no need to give a lecture to them saying they
> should switch to UTF-8 and/or what they have been doing is
> sub-par -- it isn't.
If people, for whatever reason, want git to use a given legacy character
set for its storage format, they just have to set i18n.commitencoding.
Those people aren't being lecture. (Although perhaps they _should_ be;
either they're poking at things which shouldn't concern them, or they
should be _reporting_ bugs instead of just working round them.)
The only people who would want the -n option would be those who _want_
to intentionally throw away the character set encoding, and have one
commit¹ in EBCDIC, a second in UTF-8 and a third in BIG5 with no way of
telling which is which; each of them _labelled_ with the default
encoding for the repository, which is probably UTF-8.
--
dwmw2
¹ Actually it's worse than that -- with RFC2047 you can have multiple
encodings within the same _line_ of text. Evolution at least will do that;
it uses ISO8859-1 for any character it can, and falls back to UTF-8 for
other characters. Even within the same header. Importing with '-n' would
just throw away the charset information and use the raw bytes. Even just
importing the RFC2047-encoded text as-is would be better than that.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-09 23:49 ` David Woodhouse
@ 2007-01-10 0:55 ` Junio C Hamano
2007-01-10 3:24 ` David Woodhouse
0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-01-10 0:55 UTC (permalink / raw)
To: David Woodhouse; +Cc: git
David Woodhouse <dwmw2@infradead.org> writes:
> On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote:
>> I do not think you would want to make '-n' in the third point
>> sound so negative
>
> No, I really _do_ want that.
>
>> and make people on projects that chose to use
>> legacy encoding for whatever reasons feel _dirty_.
>
> ... but not that, because it wasn't aimed at them.
>
>> If the natural language in project's log is limited and a legacy
>> encoding is sufficient, and if all the participants agree on a
>> legacy encoding to use...
> (...for git's own purely internal storage format).
>
> That's not the use case for the -n option. Their case is what the
> i18n.commitencoding configuration option exists for.
I think you missed a subtle difference.
For a project that is latin-1 only (or ISO-2022 only, for that
matter -- 'only' is the real keyword here), users did not have
to do anything, and happily kept using git, and that includes
that they did not have to set i18n.commitencoding to anything.
Defaulting to -u now means disrupt their established workflows
are disrupted by people coming from UTF-8 only world. They now
are forced to set i18n.commitencoding to latin-1 and/or use -n.
When we inconvenience others by making changes to make our own
life easier, it is not a good idea to insult them at the same
time; rather, we should be asking forgiveness from them.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default.
2007-01-10 0:55 ` Junio C Hamano
@ 2007-01-10 3:24 ` David Woodhouse
0 siblings, 0 replies; 8+ messages in thread
From: David Woodhouse @ 2007-01-10 3:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, 2007-01-09 at 16:55 -0800, Junio C Hamano wrote:
> > That's not the use case for the -n option. Their case is what the
> > i18n.commitencoding configuration option exists for.
>
> I think you missed a subtle difference.
>
> For a project that is latin-1 only (or ISO-2022 only, for that
> matter -- 'only' is the real keyword here), users did not have
> to do anything, and happily kept using git, and that includes
> that they did not have to set i18n.commitencoding to anything.
Well, apart from the fact that gitweb and git-format-patch would be
mislabelling their output, for example.
> Defaulting to -u now means disrupt their established workflows
> are disrupted by people coming from UTF-8 only world. They now
> are forced to set i18n.commitencoding to latin-1 and/or use -n.
>
> When we inconvenience others by making changes to make our own
> life easier, it is not a good idea to insult them at the same
> time; rather, we should be asking forgiveness from them.
This is true. Sometimes when we fix bugs, we find that people were
relying on the old behaviour and are inconvenienced by the fix. That's
unfortunate, and we should make sure there's a simple way for those
people to adapt -- which in this case is setting 'i18n.commitencoding'.
I'm not entirely sure where the 'insulting' comes in though -- I think
you're referring to a comment which was
a) Only in my cover message rather than in the documentation, and
b) Not even relevant to these people -- these people should be
setting i18n.commitencoding to match their status quo, and my
disparaging comment was about the '-n' option which should be
avoided.
--
dwmw2
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-01-10 3:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-12 16:46 git-mailinfo '-u' argument should be default David Woodhouse
2007-01-09 14:03 ` [PATCH] " David Woodhouse
2007-01-09 14:08 ` Johannes Schindelin
2007-01-09 14:21 ` David Woodhouse
2007-01-09 18:46 ` Junio C Hamano
2007-01-09 23:49 ` David Woodhouse
2007-01-10 0:55 ` Junio C Hamano
2007-01-10 3:24 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).