public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: "Jan Kara" <jack@suse.com>, "Steve Kenton" <skenton@ou.edu>,
	"Vojtěch Vladyka" <xvlady00@stud.feec.vutbr.cz>,
	"Karel Zak" <kzak@redhat.com>
Cc: util-linux@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: UDF & dstring
Date: Sun, 11 Jun 2017 17:10:02 +0200	[thread overview]
Message-ID: <201706111710.03013@pali> (raw)

[-- Attachment #1: Type: text/plain, Size: 2611 bytes --]

Hi!

I read UDF specification again I found another cryptic part:

=====
2.1.3 Dstrings

The ECMA 167 standard, as well as this document, has normally defined 
byte positions relative to 0. In section 7.2.12 of ECMA 167, dstrings 
are defined in terms of being relative to 1. Since this offers an 
opportunity for confusion, the following shows what the definition would 
be if described relative to 0.

7.2.12 Fixed-length character fields

A dstring of length n is a field of n bytes where d-characters (1/7.2) 
are recorded. The number of bytes used to record the characters shall be 
recorded as a Uint8 (1/7.1.1) in byte n-1, where n is the length of the 
field. The characters shall be recorded starting with the first byte of 
the field, and any remaining byte positions after the characters up 
until byte n-2 inclusive shall be set to #00.

If the number of d-characters to be encoded is zero, the length of the 
dstring shall be zero.

NOTE: The length of a dstring includes the compression code byte (2.1.1) 
except for the case of a zero length string. A zero length string shall 
be recorded by setting the entire dstring field to all zeros.
=====

Next in previous section 2.1.1 Character Sets is Compression Algorithm 
table where IDs 0-7 are reserved.

I'm not sure how to correctly interpret those sections.

Does it mean that every dstring should consist of following buffer?

L - length of encoded characters
N - size of dstring buffer

buffer:
      1   byte: 0x08 (for Latin1) or 0x10 (for UCS-2BE)
  2 - L+2 byte: encoded characters (data either in Latin1 or UCS-2BE)
L+2 - N-2 byte: 0x00
      N-1 byte: number L+1

And in special case when L = 0, then first and last byte is also zero?

Because currently we have different implementation in kernel udf driver, 
util-linux blkid library and in mkudffs from udftools.

None of those implementation accept fully empty buffer as valid dstring.

mkudffs stores at last byte length of encoded characters + 1 (for 
compression id) as written above. On the other hand blkid from util-
linux things that last byte is part of encoded characters and Linux 
kernel driver does not set last byte to some value.

So... how should be understood that UDF specification? Should last byte 
be set to length encoded characters + 1 or not? And should be fully 
empty buffer (also with compression id set to 0x00 which is reserved) 
treated as valid string (empty one)?

And... we should unify implementation of blkid, kernel udf driver and 
mkudffs.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

             reply	other threads:[~2017-06-11 15:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-11 15:10 Pali Rohár [this message]
2017-06-14  9:46 ` UDF & dstring Jan Kara
2017-06-22  8:50   ` Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201706111710.03013@pali \
    --to=pali.rohar@gmail.com \
    --cc=jack@suse.com \
    --cc=kzak@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=skenton@ou.edu \
    --cc=util-linux@vger.kernel.org \
    --cc=xvlady00@stud.feec.vutbr.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox