From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from list by lists.gnu.org with archive (Exim 4.71) id 1d9Bj3-00066X-4d for mharc-grub-devel@gnu.org; Fri, 12 May 2017 10:39:25 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44626) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1d9Bj0-00064H-BP for grub-devel@gnu.org; Fri, 12 May 2017 10:39:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1d9Bix-0005x6-9n for grub-devel@gnu.org; Fri, 12 May 2017 10:39:22 -0400 Received: from mail-wr0-x231.google.com ([2a00:1450:400c:c0c::231]:34488) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1d9Bix-0005uY-3k for grub-devel@gnu.org; Fri, 12 May 2017 10:39:19 -0400 Received: by mail-wr0-x231.google.com with SMTP id l9so45621002wre.1 for ; Fri, 12 May 2017 07:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-transfer-encoding:message-id; bh=H5WEt67M10Gkw3khg2k+3F+r4If53Pl77KAozrm50Zk=; b=jFEJ6QAbNS8G9ie37mqn8sh2JmpexBt1+cmMxJi1qSlADR24aiOHyO3/zNvhskxqLL nx0QTwhcqiU3VpHna4lU62wCWFCGVNmG4+w0RAmP8aiiNxPzSS3Qn+jWK17DgdKiKFMV ZIVXZ6NWA51RxtKkh+SzHRR/TP6qUrWQifq+y5ZXssFYKD2k75Wr8Uk6hKwJV65Is5e0 XuRnKac/ZSI90Pn1doMMbOU8Y6AinXpNN+WjdElbqBXqL36fD0es2dwWV3iTKUuXr+zT RIIq/iL4EknyMdRWzxhdkIR9M1eROTooAMzLUrZ+g+ow1rzWtjsgP+doZy8wVYBmvQq2 +LTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:user-agent:cc:references :in-reply-to:mime-version:content-transfer-encoding:message-id; bh=H5WEt67M10Gkw3khg2k+3F+r4If53Pl77KAozrm50Zk=; b=af1pdn4ao4qEsoDCMOCM+w8BO76HAWMXZaa72o66tbHCR+J0GJXd7VyX52RH+SmBT3 DknBwh8JkJ7DBIcSNQ1QhJb++dg44g4RSEPCJDFKXqFIIhWeaUiEfaOKj/VlYvWC1fL5 Equ50wdNsnd8SVHfzk5Amk6M+auG766JCbuhp7QtFqR2dqk560d4xfsIf8CwYANIdIU4 LJ78Dk+HmXm/YrvhhbLX09FEHzt3dUkjLzHwQgU9SKAT5lBiDbESSLgKHNH8c8K4Xnqw CT7QqOXBF1o9G4m0BmsgwR4j75AH1O9jlc1ig70k/+nZIS/6WnlM/w6Mnx7kDjSwrjWC EKyg== X-Gm-Message-State: AODbwcAG99bQRxveh1lzTczzFgCbrHrfr1t5++YMifMp6xYzcFnj7zE2 VP092RJshHdNow== X-Received: by 10.223.152.9 with SMTP id v9mr3724117wrb.8.1494599956300; Fri, 12 May 2017 07:39:16 -0700 (PDT) Received: from pali-latitude.localnet (pali.kolej.mff.cuni.cz. [78.128.193.202]) by smtp.gmail.com with ESMTPSA id 185sm4349834wmo.10.2017.05.12.07.39.15 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 12 May 2017 07:39:15 -0700 (PDT) From: Pali =?utf-8?q?Roh=C3=A1r?= To: "Vladimir 'phcoder' Serbinenko" Subject: Re: [PATCH] * grub-core/fs/udf.c: Add support for UUID Date: Fri, 12 May 2017 16:39:14 +0200 User-Agent: KMail/1.13.7 (Linux/3.13.0-117-generic; KDE/4.14.2; x86_64; ; ) Cc: The development of GNU GRUB References: <1491849330-10140-1-git-send-email-pali.rohar@gmail.com> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart4103665.CioOoUm9LI"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <201705121639.14849@pali> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c0c::231 X-BeenThere: grub-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: The development of GNU GRUB List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 May 2017 14:39:23 -0000 --nextPart4103665.CioOoUm9LI Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Monday 08 May 2017 15:13:28 Vladimir 'phcoder' Serbinenko wrote: > On Mon, Apr 10, 2017, 23:17 Pali Roh=C3=A1r wrote: > > -read_string (const grub_uint8_t *raw, grub_size_t sz, char > > *outbuf) +read_string (const grub_uint8_t *raw, grub_size_t sz, > > char *outbuf, int normalize_utf8) >=20 > Normalize isn't the right word. And it's not utf-8 but latin1 (called > compressed utf-16 by udf docs). > Are you sure you handle utf-16 case correctly? What is the expected > behavior in those cases? Ideally you may want to just parse raw > string in caller Hi! Now I looked at OSTA UDF spec again and found reason for my=20 disinformation... libblkid has wrongly implemented 8bit OSTA compressed=20 unicode and I just tried to mimic libblkid in grub... libblkid handles 16bit OSTA compressed unicode as UTF-16BE and 8bit OSTA=20 compressed unicode as UTF-8. In UDF 2.01 specification is written: =3D=3D=3D=3D =46or a CompressionID of 8 or 16, the value of the CompressionID shall=20 specify the number of BitsPerCharacter for the d-characters defined in=20 the CharacterBitStream field. Each sequence of CompressionID bits in the=20 CharacterBitStream field shall represent an OSTA Compressed Unicode d- character. The bits of the character being encoded shall be added to the=20 CharacterBitStream from most- to least-significant-bit. The bits shall=20 be added to the CharacterBitStream starting from the most significant=20 bit of the current byte being encoded into. The value of the OSTA=20 Compressed Unicode d-character interpreted as a Uint16 defines the value=20 of the corresponding d-character in the Unicode 2.0 standard. =3D=3D=3D=3D So it means that 8bit OSTA compressed unicode buffer contains sequence=20 of Unicode codepoints, one per 8 bits. What effectively means=20 equivalence with Latin1 (ISO-8859-1) encoding. And 16bit OSTA compressed unicode means sequence of Unicode codepoints,=20 one per 16 bits in big endian. What is probably only UCS-2 and not full=20 UTF-16. So problem is with 8bit OSTA compressed unicode if contains bytes which=20 are not UTF-8 invariants (ASCII). As those those are decoded differently=20 with Latin1 and UTF-8. (Please correct me if I'm wrong here) =46or now rather scratch/suspend this my patch until we decide what to do=20 with it due to different/wrong implementation of reading strings in=20 libblkid from util-linux. =2D-=20 Pali Roh=C3=A1r pali.rohar@gmail.com --nextPart4103665.CioOoUm9LI Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAlkVyRIACgkQi/DJPQPkQ1I0kACfYQGFauukhN8TgmwrO95QvUOE XmQAnj1nrzfqod5WvsRT0dXSjD8AfIYH =ay9N -----END PGP SIGNATURE----- --nextPart4103665.CioOoUm9LI--