public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Pat LaVarre <p.lavarre@ieee.org>
To: gerrit.scholl@philips.com
Cc: linux-kernel@vger.kernel.org, linux_udf@hpesjro.fc.hp.com
Subject: Re: Unable to read UDF fs on a DVD
Date: 27 Apr 2004 13:48:00 -0600	[thread overview]
Message-ID: <1083095280.6562.88.camel@patibmrh9> (raw)
In-Reply-To: <1083082286.6562.55.camel@patibmrh9>

> compression id 16 (search for "cid:"),
> means that the characters are coded 16 bits per character.
> 
> UDF 2.1.1:
> UDF supports standard Unicode 2.0 except the 'byte-order mark'
> chars #FEFF and #FFFE.
> These characters are coded in OSTA Compressed Unicode format,
> which means 8 bits per char or 16 bits per char.
> If a file identifier contains only unicode chars with al value
> less than #0100, compression id 8 can be used.

Link! Thank you. I clicked thru to:

--- http://www.osta.org/specs/pdf/udf250.pdf
--- (page 17 of 165)

2.1.1 Character Sets

The character set used by UDF for the structures defined in this
document is the ... OSTA CS0 character set ... defined as follows:
...

---

Between your English and their English, I conclude,

I should expect to see 8 or 16 bits per char.  Specifically, when I'm
looking at hex bytes, if I see x08 then thereafter I should see 8 bits
per char thereafter, but if I see x10 then thereafter I should see x10
bits per char.

That sure sounds easier than UTF-8 is, to decode visually from a
hexdump.  For example, I now think, with "OSTA Compressed Unicode", also
known as the "OSTA CS0 character set", the $'\xE2\x82\xAC' x20AC € "EURO
SIGN" will always appear as the plain hex byte pair x 20 AC.

With this much context in place, now the 2004-04-23 guess of "a problem
with 16 bit characters vs 8 bit characters" makes sense.  That guess
says cid 8 maybe works better than cid 16, maybe especially when we need
cid 16 to express a char outside of the x00..FF range.

Pat LaVarre



  reply	other threads:[~2004-04-27 19:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <OFA36FDF30.41353846-ONC1256E83.0039A57A-C1256E83.003B5183@phili ps.com>
     [not found] ` <OFA36FDF30.41353846-ONC1256E83.0039A57A-C1256E83.003B5183@phil i ps.com>
2004-04-27 16:11   ` Unable to read UDF fs on a DVD Pat LaVarre
2004-04-27 19:48     ` Pat LaVarre [this message]
     [not found] <OF7EE48D71.4DD148E6-ONC1256E82.003678FE-C1256E82.0038D7EE@phili ps.com>
     [not found] ` <OF7EE48D71.4DD148E6-ONC1256E82.003678FE-C1256E82.0038D7EE@phil i ps.com>
2004-04-26 21:24   ` Pat LaVarre
2004-04-23 16:28 Kronos
2004-04-23 17:56 ` Pat LaVarre
2004-04-23 19:50   ` Kronos
2004-04-23 20:21     ` Pat LaVarre
2004-04-24 19:47       ` Kronos
2004-04-26 15:52         ` Bill Davidsen
2004-04-26 21:48         ` Pat LaVarre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1083095280.6562.88.camel@patibmrh9 \
    --to=p.lavarre@ieee.org \
    --cc=gerrit.scholl@philips.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux_udf@hpesjro.fc.hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox