All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: "Vladimir 'phcoder' Serbinenko" <phcoder@gmail.com>
Cc: The development of GNU GRUB <grub-devel@gnu.org>
Subject: Re: [PATCH] * grub-core/fs/udf.c: Add support for UUID
Date: Fri, 12 May 2017 16:39:14 +0200	[thread overview]
Message-ID: <201705121639.14849@pali> (raw)
In-Reply-To: <CAEaD8JM7T8KPz1UFmJwsq7bNAKmAbWVYQFxpXTNZxzYz_6hr-w@mail.gmail.com>

[-- Attachment #1: Type: Text/Plain, Size: 2436 bytes --]

On Monday 08 May 2017 15:13:28 Vladimir 'phcoder' Serbinenko wrote:
> On Mon, Apr 10, 2017, 23:17 Pali Rohár <pali.rohar@gmail.com> wrote:
> > -read_string (const grub_uint8_t *raw, grub_size_t sz, char
> > *outbuf) +read_string (const grub_uint8_t *raw, grub_size_t sz,
> > char *outbuf, int normalize_utf8)
> 
> Normalize isn't the right word. And it's not utf-8 but latin1 (called
> compressed utf-16 by udf docs).
> Are you sure you handle utf-16 case correctly? What is the expected
> behavior in those cases? Ideally you may want to just parse raw
> string in caller

Hi! Now I looked at OSTA UDF spec again and found reason for my 
disinformation... libblkid has wrongly implemented 8bit OSTA compressed 
unicode and I just tried to mimic libblkid in grub...

libblkid handles 16bit OSTA compressed unicode as UTF-16BE and 8bit OSTA 
compressed unicode as UTF-8.

In UDF 2.01 specification is written:
====
For a CompressionID of 8 or 16, the value of the CompressionID shall 
specify the number of BitsPerCharacter for the d-characters defined in 
the CharacterBitStream field. Each sequence of CompressionID bits in the 
CharacterBitStream field shall represent an OSTA Compressed Unicode d-
character. The bits of the character being encoded shall be added to the 
CharacterBitStream from most- to least-significant-bit. The bits shall 
be added to the CharacterBitStream starting from the most significant 
bit of the current byte being encoded into. The value of the OSTA 
Compressed Unicode d-character interpreted as a Uint16 defines the value 
of the corresponding d-character in the Unicode 2.0 standard.
====

So it means that 8bit OSTA compressed unicode buffer contains sequence 
of Unicode codepoints, one per 8 bits. What effectively means 
equivalence with Latin1 (ISO-8859-1) encoding.

And 16bit OSTA compressed unicode means sequence of Unicode codepoints, 
one per 16 bits in big endian. What is probably only UCS-2 and not full 
UTF-16.

So problem is with 8bit OSTA compressed unicode if contains bytes which 
are not UTF-8 invariants (ASCII). As those those are decoded differently 
with Latin1 and UTF-8.

(Please correct me if I'm wrong here)

For now rather scratch/suspend this my patch until we decide what to do 
with it due to different/wrong implementation of reading strings in 
libblkid from util-linux.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2017-05-12 14:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10 18:35 [PATCH] * grub-core/fs/udf.c: Add support for UUID Pali Rohár
2017-04-19 17:48 ` Pali Rohár
2017-04-19 18:11   ` Andrei Borzenkov
2017-05-08 12:55   ` Pali Rohár
2017-05-08 13:13 ` Vladimir 'phcoder' Serbinenko
2017-05-08 14:24   ` Pali Rohár
2017-05-11 10:59     ` Pali Rohár
2017-05-12 14:39   ` Pali Rohár [this message]
2017-06-22 12:42 ` [PATCH v2] " Pali Rohár
2017-08-07 15:50   ` Vladimir 'phcoder' Serbinenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201705121639.14849@pali \
    --to=pali.rohar@gmail.com \
    --cc=grub-devel@gnu.org \
    --cc=phcoder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.