From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl0-f50.google.com ([209.85.160.50]:35290 "EHLO mail-pl0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752948AbdKWSgk (ORCPT ); Thu, 23 Nov 2017 13:36:40 -0500 Received: by mail-pl0-f50.google.com with SMTP id 62so3063525plc.2 for ; Thu, 23 Nov 2017 10:36:39 -0800 (PST) Message-ID: <1511462197.2541.24.camel@dubeyko.com> Subject: Re: [PATCH] hfsplus: fix the bug that cannot recognize files with hangul file name From: Viacheslav Dubeyko To: "Ernesto A." =?ISO-8859-1?Q?Fern=E1ndez?= , tchou Cc: linux-fsdevel@vger.kernel.org, linux-fsdevel-owner@vger.kernel.org, htl10@users.sourceforge.net Date: Thu, 23 Nov 2017 10:36:37 -0800 In-Reply-To: <20171123113230.GA5581@debian.home> References: <1510906805-2142-1-git-send-email-tchou@synology.com> <20171119005704.GA3495@debian.home> <080024a85dc413b72c181c6e75bdc736@synology.com> <20171123113230.GA5581@debian.home> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote: > Hi: > > your issue seems to be in the decomposition of hangul characters, not > in > the recomposition before printing. The hfsplus module on linux is > saving > the name of your actor as AC F5 C7 20, without performing any > decomposition at all. > > The reason your patch hides the bug is because it causes linux to > present > filenames as decomposed utf8, so it is not necessary to decompose > again > before working with them. But the issue is still there, and you will > most > likely run into trouble if you make a hangul filename in linux and > try > to work with it in MacOS. > > Reviewing the code it would seem that the developers completely > forgot > the hangul characters had their own rules for decomposition. It's > weird > because they did the composition part correctly. > > I've made a quick draft of a patch, mostly by copying the code > provided > in the unicode web. I don't think we can actually use it on a Could you please share the link for "the unicode web"? Thanks, Vyacheslav Dubeyko. > release, > but it should be enough to check if I'm right. It works fine on > linux, > but I don't have a mac, so it would be great if you could test it for > me. > > Thanks, > Ernest > > (By the way, there is no reason you should have to use the > nodecompose > mount option, as the other reviewer suggested. Using that option will > have a similar effect to that of your patch. It will hide the > problem, > but if you create a hangul filename on linux with that option you > probably won't be able to use it on a mac.) > > --- > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..9006c61 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > *sb, const char *astr, int len, >   return size; >  } >   > -/* Decomposes a single unicode character. */ > +/* Decomposes a single non-Hangul unicode character. */ >  static inline u16 *decompose_unichar(wchar_t uc, int *size) >  { >   int off; > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > int *size) >   return hfsplus_decompose_table + (off / 4); >  } >   > +/* Decomposes a Hangul unicode character. */ > +int decompose_hangul(wchar_t uc, u16 *result) > +{ > + int index; > + int l, v, t; > + > + index = uc - Hangul_SBase; > + if (index < 0 || index >= Hangul_SCount) > + return 0; > + > + l = Hangul_LBase + index / Hangul_NCount; > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > + t = Hangul_TBase + index % Hangul_TCount; > + > + result[0] = l; > + result[1] = v; > + if (t != Hangul_TBase) { > + result[2] = t; > + return 3; > + } > + return 2; > +} > + >  int hfsplus_asc2uni(struct super_block *sb, >       struct hfsplus_unistr *ustr, int max_unistr_len, >       const char *astr, int len) > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, >   int size, dsize, decompose; >   u16 *dstr, outlen = 0; >   wchar_t c; > + u16 hangul_buf[3]; >   >   decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > &HFSPLUS_SB(sb)->flags); >   while (outlen < max_unistr_len && len > 0) { >   size = asc2unichar(sb, astr, len, &c); >   > - if (decompose) > - dstr = decompose_unichar(c, &dsize); > - else > + if (decompose) { > + /* Hangul is handled separately */ > + dstr = &hangul_buf[0]; > + dsize = decompose_hangul(c, dstr); > + if (dsize == 0) > + /* not Hangul */ > + dstr = decompose_unichar(c, &dsize); > + } else { >   dstr = NULL; > + } > + >   if (dstr) { >   if (outlen + dsize > max_unistr_len) >   break;