From: "Pali Rohár" <pali.rohar@gmail.com>
To: "Vladimir 'phcoder' Serbinenko" <phcoder@gmail.com>
Cc: The development of GNU GRUB <grub-devel@gnu.org>
Subject: Re: [PATCH] * grub-core/fs/udf.c: Add support for UUID
Date: Fri, 12 May 2017 16:39:14 +0200 [thread overview]
Message-ID: <201705121639.14849@pali> (raw)
In-Reply-To: <CAEaD8JM7T8KPz1UFmJwsq7bNAKmAbWVYQFxpXTNZxzYz_6hr-w@mail.gmail.com>
[-- Attachment #1: Type: Text/Plain, Size: 2436 bytes --]
On Monday 08 May 2017 15:13:28 Vladimir 'phcoder' Serbinenko wrote:
> On Mon, Apr 10, 2017, 23:17 Pali Rohár <pali.rohar@gmail.com> wrote:
> > -read_string (const grub_uint8_t *raw, grub_size_t sz, char
> > *outbuf) +read_string (const grub_uint8_t *raw, grub_size_t sz,
> > char *outbuf, int normalize_utf8)
>
> Normalize isn't the right word. And it's not utf-8 but latin1 (called
> compressed utf-16 by udf docs).
> Are you sure you handle utf-16 case correctly? What is the expected
> behavior in those cases? Ideally you may want to just parse raw
> string in caller
Hi! Now I looked at OSTA UDF spec again and found reason for my
disinformation... libblkid has wrongly implemented 8bit OSTA compressed
unicode and I just tried to mimic libblkid in grub...
libblkid handles 16bit OSTA compressed unicode as UTF-16BE and 8bit OSTA
compressed unicode as UTF-8.
In UDF 2.01 specification is written:
====
For a CompressionID of 8 or 16, the value of the CompressionID shall
specify the number of BitsPerCharacter for the d-characters defined in
the CharacterBitStream field. Each sequence of CompressionID bits in the
CharacterBitStream field shall represent an OSTA Compressed Unicode d-
character. The bits of the character being encoded shall be added to the
CharacterBitStream from most- to least-significant-bit. The bits shall
be added to the CharacterBitStream starting from the most significant
bit of the current byte being encoded into. The value of the OSTA
Compressed Unicode d-character interpreted as a Uint16 defines the value
of the corresponding d-character in the Unicode 2.0 standard.
====
So it means that 8bit OSTA compressed unicode buffer contains sequence
of Unicode codepoints, one per 8 bits. What effectively means
equivalence with Latin1 (ISO-8859-1) encoding.
And 16bit OSTA compressed unicode means sequence of Unicode codepoints,
one per 16 bits in big endian. What is probably only UCS-2 and not full
UTF-16.
So problem is with 8bit OSTA compressed unicode if contains bytes which
are not UTF-8 invariants (ASCII). As those those are decoded differently
with Latin1 and UTF-8.
(Please correct me if I'm wrong here)
For now rather scratch/suspend this my patch until we decide what to do
with it due to different/wrong implementation of reading strings in
libblkid from util-linux.
--
Pali Rohár
pali.rohar@gmail.com
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2017-05-12 14:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-10 18:35 [PATCH] * grub-core/fs/udf.c: Add support for UUID Pali Rohár
2017-04-19 17:48 ` Pali Rohár
2017-04-19 18:11 ` Andrei Borzenkov
2017-05-08 12:55 ` Pali Rohár
2017-05-08 13:13 ` Vladimir 'phcoder' Serbinenko
2017-05-08 14:24 ` Pali Rohár
2017-05-11 10:59 ` Pali Rohár
2017-05-12 14:39 ` Pali Rohár [this message]
2017-06-22 12:42 ` [PATCH v2] " Pali Rohár
2017-08-07 15:50 ` Vladimir 'phcoder' Serbinenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201705121639.14849@pali \
--to=pali.rohar@gmail.com \
--cc=grub-devel@gnu.org \
--cc=phcoder@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.