* Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
@ 2008-11-12 13:32 Constantine Plotnikov
2008-11-12 14:22 ` Johannes Schindelin
0 siblings, 1 reply; 6+ messages in thread
From: Constantine Plotnikov @ 2008-11-12 13:32 UTC (permalink / raw)
To: git
If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the
git completes successfully but commit messages and author information
are not shown. I suggest that git should fail with fatal error if such
zero producing encoding is used.
If the incorrect encoding name is used, the git log does not perform
any re-encoding, but just display commits in their native encoding. I
suggest that git should fail with fatal error in this case as well.
Regards,
Constantine
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
2008-11-12 13:32 Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior Constantine Plotnikov
@ 2008-11-12 14:22 ` Johannes Schindelin
[not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com>
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2008-11-12 14:22 UTC (permalink / raw)
To: Constantine Plotnikov; +Cc: git
Hi,
On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the
> git completes successfully but commit messages and author information
> are not shown. I suggest that git should fail with fatal error if such
> zero producing encoding is used.
>
> If the incorrect encoding name is used, the git log does not perform any
> re-encoding, but just display commits in their native encoding. I
> suggest that git should fail with fatal error in this case as well.
Have you set the correct encoding with i18n.commitEncoding? If not, you
should not be surprised: Git's default encoding is UTF-8, and that fact is
well documented, AFAICT.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
[not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com>
@ 2008-11-12 16:15 ` Johannes Schindelin
2008-11-12 16:17 ` Alexander Gavrilov
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2008-11-12 16:15 UTC (permalink / raw)
To: Constantine Plotnikov; +Cc: git
Hi,
[re Cc:ing the list]
On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
> On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
> >
> >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the
> >> git completes successfully but commit messages and author information
> >> are not shown. I suggest that git should fail with fatal error if such
> >> zero producing encoding is used.
> >>
> >> If the incorrect encoding name is used, the git log does not perform any
> >> re-encoding, but just display commits in their native encoding. I
> >> suggest that git should fail with fatal error in this case as well.
> >
> > Have you set the correct encoding with i18n.commitEncoding? If not, you
> > should not be surprised: Git's default encoding is UTF-8, and that fact is
> > well documented, AFAICT.
> >
> Commit encoding is set correctly. The problem is that git log and git
> show do not support the *output* encodings UTF-16 and UCS-4 and
> silently fail in that case instead of reporting the error.
That looks more like an iconv bug to me. I assume you are using Windows?
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
2008-11-12 16:15 ` Johannes Schindelin
@ 2008-11-12 16:17 ` Alexander Gavrilov
2008-11-12 16:42 ` Johannes Schindelin
0 siblings, 1 reply; 6+ messages in thread
From: Alexander Gavrilov @ 2008-11-12 16:17 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Constantine Plotnikov, git
On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> [re Cc:ing the list]
>
> On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
>
>> On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>> >
>> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
>> >
>> >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log, the
>> >> git completes successfully but commit messages and author information
>> >> are not shown. I suggest that git should fail with fatal error if such
>> >> zero producing encoding is used.
>> >>
>> >> If the incorrect encoding name is used, the git log does not perform any
>> >> re-encoding, but just display commits in their native encoding. I
>> >> suggest that git should fail with fatal error in this case as well.
>> >
>> > Have you set the correct encoding with i18n.commitEncoding? If not, you
>> > should not be surprised: Git's default encoding is UTF-8, and that fact is
>> > well documented, AFAICT.
>> >
>> Commit encoding is set correctly. The problem is that git log and git
>> show do not support the *output* encodings UTF-16 and UCS-4 and
>> silently fail in that case instead of reporting the error.
>
> That looks more like an iconv bug to me. I assume you are using Windows?
>
Iconv has no way to know that git cannot work with ASCII-incompatible
encodings, and UTF-16 is incompatible, because it fills the output
with loads of zero bytes. Git both truncates messages on these bytes,
and forgets inserting them in strings that it produces itself.
A separate problem is that it allows creating commits with invalid
encoding names, which may be unnoticed for a long time in an
environment with uniform commitencoding settings.
Alexander
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
2008-11-12 16:17 ` Alexander Gavrilov
@ 2008-11-12 16:42 ` Johannes Schindelin
2008-11-12 16:58 ` Alexander Gavrilov
0 siblings, 1 reply; 6+ messages in thread
From: Johannes Schindelin @ 2008-11-12 16:42 UTC (permalink / raw)
To: Alexander Gavrilov; +Cc: Constantine Plotnikov, git
Hi,
On Wed, 12 Nov 2008, Alexander Gavrilov wrote:
> On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > [re Cc:ing the list]
> >
> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
> >
> >> On Wed, Nov 12, 2008 at 5:22 PM, Johannes Schindelin
> >> <Johannes.Schindelin@gmx.de> wrote:
> >> >
> >> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
> >> >
> >> >> If UTF-16[BE|LE] or UCS-4[BE|LE] encodings are used with git log,
> >> >> the git completes successfully but commit messages and author
> >> >> information are not shown. I suggest that git should fail with
> >> >> fatal error if such zero producing encoding is used.
> >> >>
> >> >> If the incorrect encoding name is used, the git log does not
> >> >> perform any re-encoding, but just display commits in their native
> >> >> encoding. I suggest that git should fail with fatal error in this
> >> >> case as well.
> >> >
> >> > Have you set the correct encoding with i18n.commitEncoding? If
> >> > not, you should not be surprised: Git's default encoding is UTF-8,
> >> > and that fact is well documented, AFAICT.
> >> >
> >> Commit encoding is set correctly. The problem is that git log and git
> >> show do not support the *output* encodings UTF-16 and UCS-4 and
> >> silently fail in that case instead of reporting the error.
> >
> > That looks more like an iconv bug to me. I assume you are using Windows?
>
> Iconv has no way to know that git cannot work with ASCII-incompatible
> encodings, and UTF-16 is incompatible, because it fills the output with
> loads of zero bytes. Git both truncates messages on these bytes, and
> forgets inserting them in strings that it produces itself.
Ah, I thought that the issue was that Git would not handle commits in that
encoding correctly. Instead, it appears that Git cannot work with UTF-16
_displays_.
Yep, I would have expected that.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior
2008-11-12 16:42 ` Johannes Schindelin
@ 2008-11-12 16:58 ` Alexander Gavrilov
0 siblings, 0 replies; 6+ messages in thread
From: Alexander Gavrilov @ 2008-11-12 16:58 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Constantine Plotnikov, git
On Wed, Nov 12, 2008 at 7:42 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Wed, 12 Nov 2008, Alexander Gavrilov wrote:
>> On Wed, Nov 12, 2008 at 7:15 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>> > On Wed, 12 Nov 2008, Constantine Plotnikov wrote:
>> >> Commit encoding is set correctly. The problem is that git log and git
>> >> show do not support the *output* encodings UTF-16 and UCS-4 and
>> >> silently fail in that case instead of reporting the error.
>> >
>> > That looks more like an iconv bug to me. I assume you are using Windows?
>>
>> Iconv has no way to know that git cannot work with ASCII-incompatible
>> encodings, and UTF-16 is incompatible, because it fills the output with
>> loads of zero bytes. Git both truncates messages on these bytes, and
>> forgets inserting them in strings that it produces itself.
>
> Ah, I thought that the issue was that Git would not handle commits in that
> encoding correctly. Instead, it appears that Git cannot work with UTF-16
> _displays_.
Actually, I think that using those encodings in commits is asking for
trouble too, because the encoding conversion is, as far as I remember,
applied to the entire contents of the commit object, and Git,
naturally, doesn't insert any null bytes in the commit headers to
please the decoder. The result is a completely trashed object on
output.
Also, I think that they are generally a poor choice of an encoding for
data transmission, because they are ASCII-incompatible,
stdlib-incompatible, unreliable to loss and addition of single bytes,
and have no way to detect encoding mismatch except by metadata or
heuristics: almost any string of shorts is "valid".
Alexander
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-11-12 16:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 13:32 Bug: UTF-16, UCS-4 and non-existing encodings for git log result in incorrect behavior Constantine Plotnikov
2008-11-12 14:22 ` Johannes Schindelin
[not found] ` <85647ef50811120727j730cb6e3lf4103c200d042fb9@mail.gmail.com>
2008-11-12 16:15 ` Johannes Schindelin
2008-11-12 16:17 ` Alexander Gavrilov
2008-11-12 16:42 ` Johannes Schindelin
2008-11-12 16:58 ` Alexander Gavrilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).