* Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior @ 2008-11-12 13:32 Constantine Plotnikov 2008-11-12 14:22 ` Johannes Schindelin 0 siblings, 1 reply; 6+ messages in thread From: Constantine Plotnikov @ 2008-11-12 13:32 UTC (permalink / raw) To: git If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the git completes successfully but commit messages and author information are not shown. I suggest that git should fail with fatal error if such zero producing encoding is used. If the incorrect encoding name is used, the git log does not perform any re-encoding, but just display commits in their native encoding. I suggest that git should fail with fatal error in this case as well. Regards, Constantine ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior 2008-11-12 13:32 Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior Constantine Plotnikov @ 2008-11-12 14:22 ` Johannes Schindelin [not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com> 0 siblings, 1 reply; 6+ messages in thread From: Johannes Schindelin @ 2008-11-12 14:22 UTC (permalink / raw) To: Constantine Plotnikov; +Cc: git Hi, On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the > git completes successfully but commit messages and author information > are not shown. I suggest that git should fail with fatal error if such > zero producing encoding is used. > > If the incorrect encoding name is used, the git log does not perform any > re-encoding, but just display commits in their native encoding. I > suggest that git should fail with fatal error in this case as well. Have you set the correct encoding with i18n.commitEncoding? If not, you should not be surprised: Git's default encoding is UTF-8, and that fact is well documented, AFAICT. Ciao, Dscho ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com>]
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior [not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com> @ 2008-11-12 16:15 ` Johannes Schindelin 2008-11-12 16:17 ` Alexander Gavrilov 0 siblings, 1 reply; 6+ messages in thread From: Johannes Schindelin @ 2008-11-12 16:15 UTC (permalink / raw) To: Constantine Plotnikov; +Cc: git Hi, [re Cc:ing the list] On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > > > >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the > >> git completes successfully but commit messages and author information > >> are not shown. I suggest that git should fail with fatal error if such > >> zero producing encoding is used. > >> > >> If the incorrect encoding name is used, the git log does not perform any > >> re-encoding, but just display commits in their native encoding. I > >> suggest that git should fail with fatal error in this case as well. > > > > Have you set the correct encoding with i18n.commitEncoding? If not, you > > should not be surprised: Git's default encoding is UTF-8, and that fact is > > well documented, AFAICT. > > > Commit encoding is set correctly. The problem is that git log and git > show do not support the *output* encodings UTF-16 and UCS-4 and > silently fail in that case instead of reporting the error. That looks more like an iconv bug to me. I assume you are using Windows? Ciao, Dscho ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior 2008-11-12 16:15 ` Johannes Schindelin @ 2008-11-12 16:17 ` Alexander Gavrilov 2008-11-12 16:42 ` Johannes Schindelin 0 siblings, 1 reply; 6+ messages in thread From: Alexander Gavrilov @ 2008-11-12 16:17 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Constantine Plotnikov, git On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi, > > [re Cc:ing the list] > > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > >> On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin >> <Johannes.Schindelin@gmx.de> wrote: >> > >> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: >> > >> >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the >> >> git completes successfully but commit messages and author information >> >> are not shown. I suggest that git should fail with fatal error if such >> >> zero producing encoding is used. >> >> >> >> If the incorrect encoding name is used, the git log does not perform any >> >> re-encoding, but just display commits in their native encoding. I >> >> suggest that git should fail with fatal error in this case as well. >> > >> > Have you set the correct encoding with i18n.commitEncoding? If not, you >> > should not be surprised: Git's default encoding is UTF-8, and that fact is >> > well documented, AFAICT. >> > >> Commit encoding is set correctly. The problem is that git log and git >> show do not support the *output* encodings UTF-16 and UCS-4 and >> silently fail in that case instead of reporting the error. > > That looks more like an iconv bug to me. I assume you are using Windows? > Iconv has no way to know that git cannot work with ASCII-incompatible encodings, and UTF-16 is incompatible, because it fills the output with loads of zero bytes. Git both truncates messages on these bytes, and forgets inserting them in strings that it produces itself. A separate problem is that it allows creating commits with invalid encoding names, which may be unnoticed for a long time in an environment with uniform commitencoding settings. Alexander ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior 2008-11-12 16:17 ` Alexander Gavrilov @ 2008-11-12 16:42 ` Johannes Schindelin 2008-11-12 16:58 ` Alexander Gavrilov 0 siblings, 1 reply; 6+ messages in thread From: Johannes Schindelin @ 2008-11-12 16:42 UTC (permalink / raw) To: Alexander Gavrilov; +Cc: Constantine Plotnikov, git Hi, On Wed, 12 Nov 2008, Alexander Gavrilov wrote: > On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > [re Cc:ing the list] > > > > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > > > >> On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin > >> <Johannes.Schindelin@gmx.de> wrote: > >> > > >> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: > >> > > >> >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, > >> >> the git completes successfully but commit messages and author > >> >> information are not shown. I suggest that git should fail with > >> >> fatal error if such zero producing encoding is used. > >> >> > >> >> If the incorrect encoding name is used, the git log does not > >> >> perform any re-encoding, but just display commits in their native > >> >> encoding. I suggest that git should fail with fatal error in this > >> >> case as well. > >> > > >> > Have you set the correct encoding with i18n.commitEncoding? If > >> > not, you should not be surprised: Git's default encoding is UTF-8, > >> > and that fact is well documented, AFAICT. > >> > > >> Commit encoding is set correctly. The problem is that git log and git > >> show do not support the *output* encodings UTF-16 and UCS-4 and > >> silently fail in that case instead of reporting the error. > > > > That looks more like an iconv bug to me. I assume you are using Windows? > > Iconv has no way to know that git cannot work with ASCII-incompatible > encodings, and UTF-16 is incompatible, because it fills the output with > loads of zero bytes. Git both truncates messages on these bytes, and > forgets inserting them in strings that it produces itself. Ah, I thought that the issue was that Git would not handle commits in that encoding correctly. Instead, it appears that Git cannot work with UTF-16 _displays_. Yep, I would have expected that. Ciao, Dscho ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior 2008-11-12 16:42 ` Johannes Schindelin @ 2008-11-12 16:58 ` Alexander Gavrilov 0 siblings, 0 replies; 6+ messages in thread From: Alexander Gavrilov @ 2008-11-12 16:58 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Constantine Plotnikov, git On Wed, Nov 12, 2008 at 7:42 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Wed, 12 Nov 2008, Alexander Gavrilov wrote: >> On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin >> <Johannes.Schindelin@gmx.de> wrote: >> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote: >> >> Commit encoding is set correctly. The problem is that git log and git >> >> show do not support the *output* encodings UTF-16 and UCS-4 and >> >> silently fail in that case instead of reporting the error. >> > >> > That looks more like an iconv bug to me. I assume you are using Windows? >> >> Iconv has no way to know that git cannot work with ASCII-incompatible >> encodings, and UTF-16 is incompatible, because it fills the output with >> loads of zero bytes. Git both truncates messages on these bytes, and >> forgets inserting them in strings that it produces itself. > > Ah, I thought that the issue was that Git would not handle commits in that > encoding correctly. Instead, it appears that Git cannot work with UTF-16 > _displays_. Actually, I think that using those encodings in commits is asking for trouble too, because the encoding conversion is, as far as I remember, applied to the entire contents of the commit object, and Git, naturally, doesn't insert any null bytes in the commit headers to please the decoder. The result is a completely trashed object on output. Also, I think that they are generally a poor choice of an encoding for data transmission, because they are ASCII-incompatible, stdlib-incompatible, unreliable to loss and addition of single bytes, and have no way to detect encoding mismatch except by metadata or heuristics: almost any string of shorts is "valid". Alexander ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-11-12 16:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 13:32 Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior Constantine Plotnikov
2008-11-12 14:22 ` Johannes Schindelin
[not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com>
2008-11-12 16:15 ` Johannes Schindelin
2008-11-12 16:17 ` Alexander Gavrilov
2008-11-12 16:42 ` Johannes Schindelin
2008-11-12 16:58 ` Alexander Gavrilov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).