From: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>
To: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: "Ernesto A. Fernández" <ernesto.mnd.fernandez@gmail.com>,
tchou <tchou@synology.com>,
linux-fsdevel@vger.kernel.org,
linux-fsdevel-owner@vger.kernel.org, htl10@users.sourceforge.net
Subject: Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name
Date: Thu, 23 Nov 2017 19:20:11 -0300 [thread overview]
Message-ID: <20171123222009.GA1269@debian.home> (raw)
In-Reply-To: <1511462197.2541.24.camel@dubeyko.com>
On Thu, Nov 23, 2017 at 10:36:37AM -0800, Viacheslav Dubeyko wrote:
> On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote:
> > Hi:
> >
> > your issue seems to be in the decomposition of hangul characters, not
> > in
> > the recomposition before printing. The hfsplus module on linux is
> > saving
> > the name of your actor as AC F5 C7 20, without performing any
> > decomposition at all.
> >
> > The reason your patch hides the bug is because it causes linux to
> > present
> > filenames as decomposed utf8, so it is not necessary to decompose
> > again
> > before working with them. But the issue is still there, and you will
> > most
> > likely run into trouble if you make a hangul filename in linux and
> > try
> > to work with it in MacOS.
> >
> > Reviewing the code it would seem that the developers completely
> > forgot
> > the hangul characters had their own rules for decomposition. It's
> > weird
> > because they did the composition part correctly.
> >
> > I've made a quick draft of a patch, mostly by copying the code
> > provided
> > in the unicode web. I don't think we can actually use it on a
>
>
> Could you please share the link for "the unicode web"?
>
> Thanks,
> Vyacheslav Dubeyko.
I'm not asking for any reviews yet, just testing because I don't have
a Mac. As long as that's clear, this is the latest version of Unicode:
www.unicode.org/versions/Unicode10.0.0/
You want section 3.12.
>
>
> > release,
> > but it should be enough to check if I'm right. It works fine on
> > linux,
> > but I don't have a mac, so it would be great if you could test it for
> > me.
> >
> > Thanks,
> > Ernest
> >
> > (By the way, there is no reason you should have to use the
> > nodecompose
> > mount option, as the other reviewer suggested. Using that option will
> > have a similar effect to that of your patch. It will hide the
> > problem,
> > but if you create a hangul filename on linux with that option you
> > probably won't be able to use it on a mac.)
> >
> > ---
> > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c
> > index dfa90c2..9006c61 100644
> > --- a/fs/hfsplus/unicode.c
> > +++ b/fs/hfsplus/unicode.c
> > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block
> > *sb, const char *astr, int len,
> > return size;
> > }
> >
> > -/* Decomposes a single unicode character. */
> > +/* Decomposes a single non-Hangul unicode character. */
> > static inline u16 *decompose_unichar(wchar_t uc, int *size)
> > {
> > int off;
> > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc,
> > int *size)
> > return hfsplus_decompose_table + (off / 4);
> > }
> >
> > +/* Decomposes a Hangul unicode character. */
> > +int decompose_hangul(wchar_t uc, u16 *result)
> > +{
> > + int index;
> > + int l, v, t;
> > +
> > + index = uc - Hangul_SBase;
> > + if (index < 0 || index >= Hangul_SCount)
> > + return 0;
> > +
> > + l = Hangul_LBase + index / Hangul_NCount;
> > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount;
> > + t = Hangul_TBase + index % Hangul_TCount;
> > +
> > + result[0] = l;
> > + result[1] = v;
> > + if (t != Hangul_TBase) {
> > + result[2] = t;
> > + return 3;
> > + }
> > + return 2;
> > +}
> > +
> > int hfsplus_asc2uni(struct super_block *sb,
> > struct hfsplus_unistr *ustr, int max_unistr_len,
> > const char *astr, int len)
> > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb,
> > int size, dsize, decompose;
> > u16 *dstr, outlen = 0;
> > wchar_t c;
> > + u16 hangul_buf[3];
> >
> > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE,
> > &HFSPLUS_SB(sb)->flags);
> > while (outlen < max_unistr_len && len > 0) {
> > size = asc2unichar(sb, astr, len, &c);
> >
> > - if (decompose)
> > - dstr = decompose_unichar(c, &dsize);
> > - else
> > + if (decompose) {
> > + /* Hangul is handled separately */
> > + dstr = &hangul_buf[0];
> > + dsize = decompose_hangul(c, dstr);
> > + if (dsize == 0)
> > + /* not Hangul */
> > + dstr = decompose_unichar(c, &dsize);
> > + } else {
> > dstr = NULL;
> > + }
> > +
> > if (dstr) {
> > if (outlen + dsize > max_unistr_len)
> > break;
next prev parent reply other threads:[~2017-11-23 22:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-17 8:20 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Ting-Chang Hou
2017-11-19 0:57 ` Ernesto A. Fernández
2017-11-23 3:57 ` tchou
2017-11-23 4:21 ` Viacheslav Dubeyko
2017-11-23 6:05 ` tchou
2017-11-23 6:23 ` Viacheslav Dubeyko
2017-11-23 6:34 ` tchou
2017-11-23 11:32 ` Ernesto A. Fernández
2017-11-23 18:36 ` Viacheslav Dubeyko
2017-11-23 22:20 ` Ernesto A. Fernández [this message]
2017-11-24 7:25 ` tchou
2017-11-24 11:45 ` Ernesto A. Fernández
2017-11-27 2:07 ` tchou
2017-11-27 19:36 ` [PATCH] hfsplus: fix decomposition of Hangul characters Ernesto A. Fernández
2017-11-27 22:40 ` Viacheslav Dubeyko
2017-11-28 15:02 ` Ernesto A. Fernández
2017-11-28 16:30 ` Viacheslav Dubeyko
2017-11-28 18:15 ` Ernesto A. Fernández
2018-08-23 18:29 ` Ernesto A. Fernández
2018-08-24 1:20 ` tchou
-- strict thread matches above, loose matches on Subject: below --
2017-11-17 19:33 [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name Slava Dubeyko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171123222009.GA1269@debian.home \
--to=ernesto.mnd.fernandez@gmail.com \
--cc=htl10@users.sourceforge.net \
--cc=linux-fsdevel-owner@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=slava@dubeyko.com \
--cc=tchou@synology.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.