* git-mailinfo '-u' argument should be default. @ 2006-05-12 16:46 David Woodhouse 2007-01-09 14:03 ` [PATCH] " David Woodhouse 0 siblings, 1 reply; 8+ messages in thread From: David Woodhouse @ 2006-05-12 16:46 UTC (permalink / raw) To: git If you apply a patch with 'git-am', it takes the raw content of the From: header, in whatever character set that happens to be, and puts that content, untagged, into the commit object. That is almost _never_ the right thing to do, surely? Raw data in untagged character sets is marginally better than line noise. We should default to the '-u' behaviour, which converts to the character set which the git repo is stored in. It's not just for converting to UTF-8, although it looks like it was in the past. Now, however, it converts to the character set defined in the configuration. So even if it's a Luddite who for some reason is sticking with an obsolete character set, it gets that right. The only option which makes sense _other_ than that would be to just stick the RFC2047-encoded original From: header into the commit, surely? -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] Re: git-mailinfo '-u' argument should be default. 2006-05-12 16:46 git-mailinfo '-u' argument should be default David Woodhouse @ 2007-01-09 14:03 ` David Woodhouse 2007-01-09 14:08 ` Johannes Schindelin 2007-01-09 18:46 ` Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: David Woodhouse @ 2007-01-09 14:03 UTC (permalink / raw) To: git; +Cc: Dave Jones, Junio C Hamano, chrisw On Fri, 2006-05-12 at 17:46 +0100, David Woodhouse wrote: > If you apply a patch with 'git-am', it takes the raw content of the > From: header, in whatever character set that happens to be, and puts > that content, untagged, into the commit object. > > That is almost _never_ the right thing to do, surely? Raw data in > untagged character sets is marginally better than line noise. > > We should default to the '-u' behaviour, which converts to the character > set which the git repo is stored in. It's not just for converting to > UTF-8, although it looks like it was in the past. Now, however, it > converts to the character set defined in the configuration. So even if > it's a Luddite who for some reason is sticking with an obsolete > character set, it gets that right. This patch: 1. Fixes the default not to throw away the MIME information. 2. Adds a '-n' option with the old behaviour, although I can't actually imagine why someone might find that desirable. 3. Aborts if the conversion fails, allowing the user to fix it rather than silently corrupting the input. There's always the new '-n' option if the user _really_ wants it corrupted. :) diff --git a/Documentation/git-mailinfo.txt b/Documentation/git-mailinfo.txt index ea0a065..97b6c5b 100644 --- a/Documentation/git-mailinfo.txt +++ b/Documentation/git-mailinfo.txt @@ -8,7 +8,7 @@ git-mailinfo - Extracts patch from a single e-mail message SYNOPSIS -------- -'git-mailinfo' [-k] [-u | --encoding=<encoding>] <msg> <patch> +'git-mailinfo' [-k] [-n | --encoding=<encoding>] <msg> <patch> DESCRIPTION @@ -34,19 +34,20 @@ OPTIONS -u:: By default, the commit log message, author name and - author email are taken from the e-mail without any - charset conversion, after minimally decoding MIME - transfer encoding. This flag causes the resulting - commit to be encoded in the encoding specified by - i18n.commitencoding configuration (defaults to utf-8) by - transliterating them. + author email are transliterated if necessary in order + to store them in the correct encoding as specified by + the i18n.commitencoding configuration option (which + defaults to UTF-8). This flag disables such character + set conversion; performing basic MIME transfer decoding + but then intentionally discarding the character set + information and using the raw byte sequences. Note that the patch is always used as is without charset - conversion, even with this flag. + conversion, even without this flag. --encoding=<encoding>:: - Similar to -u but if the local convention is different - from what is specified by i18n.commitencoding, this flag - can be used to override it. + If the i18n.commitencoding configuration option is set + set incorrectly (or unset when the UTF-8 default is not + desired), this flag can be used to override it. <msg>:: The commit log message extracted from e-mail, usually diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c index a67f3eb..02bd183 100644 --- a/builtin-mailinfo.c +++ b/builtin-mailinfo.c @@ -518,8 +518,7 @@ static void convert_to_utf8(char *line, char *charset) if (!out) { fprintf(stderr, "cannot convert from %s to %s\n", input_charset, metainfo_charset); - *charset = 0; - return; + exit(1); } strcpy(line, out); free(out); @@ -793,7 +792,7 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding, } static const char mailinfo_usage[] = - "git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info"; + "git-mailinfo [-k] [-n | --encoding=<encoding>] msg patch <mail >info"; int cmd_mailinfo(int argc, const char **argv, const char *prefix) { @@ -802,12 +801,16 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix) */ git_config(git_default_config); + metainfo_charset = (git_commit_encoding + ? git_commit_encoding : "utf-8"); + while (1 < argc && argv[1][0] == '-') { if (!strcmp(argv[1], "-k")) keep_subject = 1; - else if (!strcmp(argv[1], "-u")) - metainfo_charset = (git_commit_encoding - ? git_commit_encoding : "utf-8"); + else if (!strcmp(argv[1], "-u")) + ; /* this is now default behaviour */ + else if (!strcmp(argv[1], "-n")) + metainfo_charset = NULL; else if (!strncmp(argv[1], "--encoding=", 11)) metainfo_charset = argv[1] + 11; else -- dwmw2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-09 14:03 ` [PATCH] " David Woodhouse @ 2007-01-09 14:08 ` Johannes Schindelin 2007-01-09 14:21 ` David Woodhouse 2007-01-09 18:46 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Johannes Schindelin @ 2007-01-09 14:08 UTC (permalink / raw) To: David Woodhouse; +Cc: git, Dave Jones, Junio C Hamano, chrisw Hi, On Tue, 9 Jan 2007, David Woodhouse wrote: in the Documentation, > -u:: needs to be replaced by "-n::", and > + If the i18n.commitencoding configuration option is set > + set incorrectly (or unset when the UTF-8 default is not needs only one "set", not two. Ciao, Dscho ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-09 14:08 ` Johannes Schindelin @ 2007-01-09 14:21 ` David Woodhouse 0 siblings, 0 replies; 8+ messages in thread From: David Woodhouse @ 2007-01-09 14:21 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Dave Jones, Junio C Hamano, chrisw On Tue, 2007-01-09 at 15:08 +0100, Johannes Schindelin wrote: > Hi, > > On Tue, 9 Jan 2007, David Woodhouse wrote: > > in the Documentation, > > > -u:: > > needs to be replaced by "-n::", and > > > + If the i18n.commitencoding configuration option is set > > + set incorrectly (or unset when the UTF-8 default is not > > needs only one "set", not two. Oops; thanks. diff --git a/Documentation/git-mailinfo.txt b/Documentation/git-mailinfo.txt index ea0a065..e267053 100644 --- a/Documentation/git-mailinfo.txt +++ b/Documentation/git-mailinfo.txt @@ -8,7 +8,7 @@ git-mailinfo - Extracts patch from a single e-mail message SYNOPSIS -------- -'git-mailinfo' [-k] [-u | --encoding=<encoding>] <msg> <patch> +'git-mailinfo' [-k] [-n | --encoding=<encoding>] <msg> <patch> DESCRIPTION @@ -32,21 +32,22 @@ OPTIONS munging, and is most useful when used to read back 'git format-patch --mbox' output. --u:: +-n:: By default, the commit log message, author name and - author email are taken from the e-mail without any - charset conversion, after minimally decoding MIME - transfer encoding. This flag causes the resulting - commit to be encoded in the encoding specified by - i18n.commitencoding configuration (defaults to utf-8) by - transliterating them. + author email are transliterated if necessary in order + to store them in the correct encoding as specified by + the i18n.commitencoding configuration option (which + defaults to UTF-8). This flag disables such character + set conversion; performing basic MIME transfer decoding + but then intentionally discarding the character set + information and using the raw byte sequences. Note that the patch is always used as is without charset - conversion, even with this flag. + conversion, even without this flag. --encoding=<encoding>:: - Similar to -u but if the local convention is different - from what is specified by i18n.commitencoding, this flag - can be used to override it. + If the i18n.commitencoding configuration option is set + incorrectly (or unset when the UTF-8 default is not + desired), this flag can be used to override it. <msg>:: The commit log message extracted from e-mail, usually diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c index a67f3eb..02bd183 100644 --- a/builtin-mailinfo.c +++ b/builtin-mailinfo.c @@ -518,8 +518,7 @@ static void convert_to_utf8(char *line, char *charset) if (!out) { fprintf(stderr, "cannot convert from %s to %s\n", input_charset, metainfo_charset); - *charset = 0; - return; + exit(1); } strcpy(line, out); free(out); @@ -793,7 +792,7 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding, } static const char mailinfo_usage[] = - "git-mailinfo [-k] [-u | --encoding=<encoding>] msg patch <mail >info"; + "git-mailinfo [-k] [-n | --encoding=<encoding>] msg patch <mail >info"; int cmd_mailinfo(int argc, const char **argv, const char *prefix) { @@ -802,12 +801,16 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix) */ git_config(git_default_config); + metainfo_charset = (git_commit_encoding + ? git_commit_encoding : "utf-8"); + while (1 < argc && argv[1][0] == '-') { if (!strcmp(argv[1], "-k")) keep_subject = 1; - else if (!strcmp(argv[1], "-u")) - metainfo_charset = (git_commit_encoding - ? git_commit_encoding : "utf-8"); + else if (!strcmp(argv[1], "-u")) + ; /* this is now default behaviour */ + else if (!strcmp(argv[1], "-n")) + metainfo_charset = NULL; else if (!strncmp(argv[1], "--encoding=", 11)) metainfo_charset = argv[1] + 11; else -- dwmw2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-09 14:03 ` [PATCH] " David Woodhouse 2007-01-09 14:08 ` Johannes Schindelin @ 2007-01-09 18:46 ` Junio C Hamano 2007-01-09 23:49 ` David Woodhouse 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2007-01-09 18:46 UTC (permalink / raw) To: David Woodhouse; +Cc: git David Woodhouse <dwmw2@infradead.org> writes: > On Fri, 2006-05-12 at 17:46 +0100, David Woodhouse wrote: >> ... > This patch: > 1. Fixes the default not to throw away the MIME information. > 2. Adds a '-n' option with the old behaviour, although I can't > actually imagine why someone might find that desirable. > 3. Aborts if the conversion fails, allowing the user to fix it > rather than silently corrupting the input. There's always the > new '-n' option if the user _really_ wants it corrupted. :) Thanks. Documentation/SubmittingPatches, and Sign-off? I do not think you would want to make '-n' in the third point sound so negative and make people on projects that chose to use legacy encoding for whatever reasons feel _dirty_. If the natural language in project's log is limited and a legacy encoding is sufficient, and if all the participants agree on a legacy encoding to use because tools other than git they need to use are more convenient with the legacy encoding rather than UTF-8, there is no need to give a lecture to them saying they should switch to UTF-8 and/or what they have been doing is sub-par -- it isn't. If the command allows straight-through (and I think it should) but now defaults to UTF-8, you also need to update the existing Porcelain-level tools (i.e. callers of mailinfo) so that they pass -n when the end-user says "I want straight-through"; asking for UTF-8 can be done by either not passing anything or explicitly passing -u, but the point is that the callers need to be changed anyway. And at that point, the default of mailinfo does not matter that much -- although it is good for consistency to make it also default to UTF-8. I've updated git-am yesterday to default to --utf8 (which is overridable with --no-utf8) but did not touch mailinfo during that process; it further needs to be told about the -n option. I haven't touched git-applymbox yet but it should also be taught about the new default and the override. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-09 18:46 ` Junio C Hamano @ 2007-01-09 23:49 ` David Woodhouse 2007-01-10 0:55 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: David Woodhouse @ 2007-01-09 23:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote: > I do not think you would want to make '-n' in the third point > sound so negative No, I really _do_ want that. > and make people on projects that chose to use > legacy encoding for whatever reasons feel _dirty_. ... but not that, because it wasn't aimed at them. > If the natural language in project's log is limited and a legacy > encoding is sufficient, and if all the participants agree on a > legacy encoding to use... (...for git's own purely internal storage format). That's not the use case for the -n option. Their case is what the i18n.commitencoding configuration option exists for. Although having said that, I don't actually know _why_ we let them override the default, since it's _internal_ to git. As long as git itself is correctly doing the conversion on the way in and out, there's no reason for them to care whether we use UTF-8, UCS-4, EBCDIC or some other arbitrary encoding (as long as our encoding can represent anything they choose to throw at us). > because tools other than git they need to > use are more convenient with the legacy encoding rather than > UTF-8, That makes about as much sense to me as letting them configure git to store objects uncompressed "because tools other than git are more convenient without compression". If our choice of _internal_ storage affects their other tools, then either they're doing something very strange like poking at git objects directly, or there's a bug in the git tools. > there is no need to give a lecture to them saying they > should switch to UTF-8 and/or what they have been doing is > sub-par -- it isn't. If people, for whatever reason, want git to use a given legacy character set for its storage format, they just have to set i18n.commitencoding. Those people aren't being lecture. (Although perhaps they _should_ be; either they're poking at things which shouldn't concern them, or they should be _reporting_ bugs instead of just working round them.) The only people who would want the -n option would be those who _want_ to intentionally throw away the character set encoding, and have one commit¹ in EBCDIC, a second in UTF-8 and a third in BIG5 with no way of telling which is which; each of them _labelled_ with the default encoding for the repository, which is probably UTF-8. -- dwmw2 ¹ Actually it's worse than that -- with RFC2047 you can have multiple encodings within the same _line_ of text. Evolution at least will do that; it uses ISO8859-1 for any character it can, and falls back to UTF-8 for other characters. Even within the same header. Importing with '-n' would just throw away the charset information and use the raw bytes. Even just importing the RFC2047-encoded text as-is would be better than that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-09 23:49 ` David Woodhouse @ 2007-01-10 0:55 ` Junio C Hamano 2007-01-10 3:24 ` David Woodhouse 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2007-01-10 0:55 UTC (permalink / raw) To: David Woodhouse; +Cc: git David Woodhouse <dwmw2@infradead.org> writes: > On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote: >> I do not think you would want to make '-n' in the third point >> sound so negative > > No, I really _do_ want that. > >> and make people on projects that chose to use >> legacy encoding for whatever reasons feel _dirty_. > > ... but not that, because it wasn't aimed at them. > >> If the natural language in project's log is limited and a legacy >> encoding is sufficient, and if all the participants agree on a >> legacy encoding to use... > (...for git's own purely internal storage format). > > That's not the use case for the -n option. Their case is what the > i18n.commitencoding configuration option exists for. I think you missed a subtle difference. For a project that is latin-1 only (or ISO-2022 only, for that matter -- 'only' is the real keyword here), users did not have to do anything, and happily kept using git, and that includes that they did not have to set i18n.commitencoding to anything. Defaulting to -u now means disrupt their established workflows are disrupted by people coming from UTF-8 only world. They now are forced to set i18n.commitencoding to latin-1 and/or use -n. When we inconvenience others by making changes to make our own life easier, it is not a good idea to insult them at the same time; rather, we should be asking forgiveness from them. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Re: git-mailinfo '-u' argument should be default. 2007-01-10 0:55 ` Junio C Hamano @ 2007-01-10 3:24 ` David Woodhouse 0 siblings, 0 replies; 8+ messages in thread From: David Woodhouse @ 2007-01-10 3:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, 2007-01-09 at 16:55 -0800, Junio C Hamano wrote: > > That's not the use case for the -n option. Their case is what the > > i18n.commitencoding configuration option exists for. > > I think you missed a subtle difference. > > For a project that is latin-1 only (or ISO-2022 only, for that > matter -- 'only' is the real keyword here), users did not have > to do anything, and happily kept using git, and that includes > that they did not have to set i18n.commitencoding to anything. Well, apart from the fact that gitweb and git-format-patch would be mislabelling their output, for example. > Defaulting to -u now means disrupt their established workflows > are disrupted by people coming from UTF-8 only world. They now > are forced to set i18n.commitencoding to latin-1 and/or use -n. > > When we inconvenience others by making changes to make our own > life easier, it is not a good idea to insult them at the same > time; rather, we should be asking forgiveness from them. This is true. Sometimes when we fix bugs, we find that people were relying on the old behaviour and are inconvenienced by the fix. That's unfortunate, and we should make sure there's a simple way for those people to adapt -- which in this case is setting 'i18n.commitencoding'. I'm not entirely sure where the 'insulting' comes in though -- I think you're referring to a comment which was a) Only in my cover message rather than in the documentation, and b) Not even relevant to these people -- these people should be setting i18n.commitencoding to match their status quo, and my disparaging comment was about the '-n' option which should be avoided. -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-01-10 3:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-05-12 16:46 git-mailinfo '-u' argument should be default David Woodhouse 2007-01-09 14:03 ` [PATCH] " David Woodhouse 2007-01-09 14:08 ` Johannes Schindelin 2007-01-09 14:21 ` David Woodhouse 2007-01-09 18:46 ` Junio C Hamano 2007-01-09 23:49 ` David Woodhouse 2007-01-10 0:55 ` Junio C Hamano 2007-01-10 3:24 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).