All of lore.kernel.org
 help / color / mirror / Atom feed
From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: Kevin Daudt <git@lists.ikke.info>
Cc: git@vger.kernel.org, larsxschneider@gmail.com
Subject: Re: t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux)
Date: Fri, 8 Feb 2019 00:17:05 +0000	[thread overview]
Message-ID: <20190208001705.GC11927@genre.crustytoothpaste.net> (raw)
In-Reply-To: <20190207215935.GA31515@alpha>

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

[Please skip using Reply-To and instead of Mail-Followup-To so that
responses also go to the list.]

On Thu, Feb 07, 2019 at 10:59:35PM +0100, Kevin Daudt wrote:
> I'm trying to get the git test suite passing on Alpine Linux, which is
> based on musl libc.
> 
> All tests in t0028-working-tree-encoding.sh are currently failing,
> because musl iconv does not support statefull output of UTF-16/32 (eg,
> it does not output a BOM), while git is expecting that to be present:
> 
> > hint: The file 'test.utf16' is missing a byte order mark (BOM). Please
> > use UTF-16BE or UTF-16LE (depending on the byte order) as
> > working-tree-encoding.
> > fatal: BOM is required in 'test.utf16' if encoded as utf-16
> 
> Because adding the file to get fails, all the other tests fail as well
> as they expect the file to be present in the repository.
> 
> Any idea how to get around this?

I think musl needs to patch their libc. RFC 2781 says that if there's no
BOM in UTF-16, then "the text SHOULD be interpreted as being
big-endian."

Unfortunately for all of us, many Windows-based programs have chosen to
ignore that advice (technically, it's only a SHOULD) and interpret it as
little-endian instead. Git can't safely assume anything about the
endianness of a UTF-16 stream that doesn't contain a BOM. Technically,
since the RFC doesn't specify a MUST requirement, musl can't, either.

Even if Git were to produce a BOM to work around this issue, then we'd
still have the problem that any program using musl will write data in
UTF-16 without a BOM. Moreover, because musl, in violation of the RFC,
doesn't read and process BOMs, someone using little-endian UTF-16 (with
a proper BOM) with musl and Git will have their data corrupted,
according to my reading of the musl website.

In other words, I believe this test is failing legitimately.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

  reply	other threads:[~2019-02-08  0:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 21:59 t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Kevin Daudt
2019-02-08  0:17 ` brian m. carlson [this message]
2019-02-08  6:04   ` Rich Felker
2019-02-08 11:45     ` brian m. carlson
2019-02-08 11:55       ` Kevin Daudt
2019-02-08 13:51         ` brian m. carlson
2019-02-08 17:50           ` Junio C Hamano
2019-02-08 20:23             ` Kevin Daudt
2019-02-08 20:42               ` brian m. carlson
2019-02-08 23:12                 ` Junio C Hamano
2019-02-09  0:24                   ` brian m. carlson
2019-02-09 14:57                 ` Kevin Daudt
2019-02-09 20:08                   ` [PATCH] utf8: handle systems that don't write BOM for UTF-16 brian m. carlson
2019-02-10  1:45                     ` Eric Sunshine
2019-02-10 18:14                       ` brian m. carlson
2019-02-10  8:04                     ` Torsten Bögershausen
2019-02-10 18:55                       ` brian m. carlson
2019-02-11 17:14                         ` Junio C Hamano
2019-02-11  0:23                     ` [PATCH v2] " brian m. carlson
2019-02-11  1:16                       ` Eric Sunshine
2019-02-11  1:20                         ` brian m. carlson
2019-02-11  1:26                     ` [PATCH v3] " brian m. carlson
2019-02-11 21:43                       ` Kevin Daudt
2019-02-11 23:58                         ` brian m. carlson
2019-02-12  0:31                           ` Junio C Hamano
2019-02-12  0:53                             ` brian m. carlson
2019-02-12  2:43                               ` Junio C Hamano
2019-02-12  0:52                     ` [PATCH v4] " brian m. carlson
2019-02-08 16:13         ` t0028-working-tree-encoding.sh failing on musl based systems (Alpine Linux) Rich Felker
2019-02-09  8:09     ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190208001705.GC11927@genre.crustytoothpaste.net \
    --to=sandals@crustytoothpaste.net \
    --cc=git@lists.ikke.info \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.