linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08  2:19 Hin-Tak Leung
  2014-04-08  7:05 ` Vyacheslav Dubeyko
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08  2:19 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Anton Altaparmakov, Hin-Tak Leung, Anton Altaparmakov,
	Vyacheslav Dubeyko, Al Viro, Christoph Hellwig

From: Hin-Tak Leung <htl10@users.sourceforge.net>

The HFS Plus Volume Format specification (TN1150) states that
file names are stored internally as a maximum of 255 unicode
characters, as defined by The Unicode Standard, Version 2.0
[Unicode, Inc. ISBN 0-201-48345-9]. File names are converted by
the NLS system on Linux before presented to the user.

255 CJK characters converts to UTF-8 with 1 unicode character
to up to 3 bytes, and to GB18030 with 1 unicode character to
up to 4 bytes. Thus, trying in a UTF-8 locale to list files
with names of more than 85 CJK characters results in:

    $ ls /mnt
    ls: reading directory /mnt: File name too long

The receiving buffer needs to be 255 x NLS_MAX_CHARSET_SIZE bytes,
not 255 bytes as the code has always been.

Similar consideration applies to attributes, which are stored
internally as a maximum of 127 UTF-16BE units in HFS+.

Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
is not attainable in the case of conversion to UTF-8, as that
requires the use of surrogate pairs, i.e. consuming two storage units.

Thanks Anton Altaparmakov for reviewing an earlier version of
this change.

Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
CC: Anton Altaparmakov <anton@tuxera.com>
CC: Vyacheslav Dubeyko <slava@dubeyko.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Christoph Hellwig <hch@infradead.org>
---
 fs/hfsplus/dir.c   | 12 ++++++++++--
 fs/hfsplus/xattr.c | 15 ++++++++++++---
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index bdec665..f95abef 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -12,6 +12,7 @@
 #include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/random.h>
+#include <linux/nls.h>
 
 #include "hfsplus_fs.h"
 #include "hfsplus_raw.h"
@@ -127,7 +128,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
 	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
 	int len, err;
-	char strbuf[HFSPLUS_MAX_STRLEN + 1];
+	char *strbuf;
 	hfsplus_cat_entry entry;
 	struct hfs_find_data fd;
 	struct hfsplus_readdir_data *rd;
@@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
 	err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
 	if (err)
 		return err;
+	strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
+	if (!strbuf) {
+		err = -ENOMEM;
+		goto failed_strbuf_alloc;
+	}
 	hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL);
 	err = hfs_brec_find(&fd, hfs_find_rec_by_key);
 	if (err)
@@ -193,7 +199,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
 		hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
 			fd.entrylength);
 		type = be16_to_cpu(entry.type);
-		len = HFSPLUS_MAX_STRLEN;
+		len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
 		err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
 		if (err)
 			goto out;
@@ -246,6 +252,8 @@ next:
 	}
 	memcpy(&rd->key, fd.key, sizeof(struct hfsplus_cat_key));
 out:
+	kfree(strbuf);
+failed_strbuf_alloc:
 	hfs_find_exit(&fd);
 	return err;
 }
diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
index 4e27edc..cfbb23d 100644
--- a/fs/hfsplus/xattr.c
+++ b/fs/hfsplus/xattr.c
@@ -8,6 +8,7 @@
 
 #include "hfsplus_fs.h"
 #include <linux/posix_acl_xattr.h>
+#include <linux/nls.h>
 #include "xattr.h"
 #include "acl.h"
 
@@ -645,8 +646,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
 	struct hfs_find_data fd;
 	u16 key_len = 0;
 	struct hfsplus_attr_key attr_key;
-	char strbuf[HFSPLUS_ATTR_MAX_STRLEN +
-			XATTR_MAC_OSX_PREFIX_LEN + 1] = {0};
+	char *strbuf;
 	int xattr_name_len;
 
 	if ((!S_ISREG(inode->i_mode) &&
@@ -666,6 +666,13 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
 		return err;
 	}
 
+	strbuf = kzalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN +
+			XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL);
+	if (!strbuf) {
+		res = -ENOMEM;
+		goto failed_strbuf_alloc;
+	}
+
 	err = hfsplus_find_attr(inode->i_sb, inode->i_ino, NULL, &fd);
 	if (err) {
 		if (err == -ENOENT) {
@@ -692,7 +699,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
 		if (be32_to_cpu(attr_key.cnid) != inode->i_ino)
 			goto end_listxattr;
 
-		xattr_name_len = HFSPLUS_ATTR_MAX_STRLEN;
+		xattr_name_len = NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN;
 		if (hfsplus_uni2asc(inode->i_sb,
 			(const struct hfsplus_unistr *)&fd.key->attr.key_name,
 					strbuf, &xattr_name_len)) {
@@ -718,6 +725,8 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
 	}
 
 end_listxattr:
+	kfree(strbuf);
+failed_strbuf_alloc:
 	hfs_find_exit(&fd);
 	return res;
 }
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08  2:19 [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes Hin-Tak Leung
@ 2014-04-08  7:05 ` Vyacheslav Dubeyko
  2014-04-08 11:56 ` Sergei Antonov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-08  7:05 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: linux-fsdevel, Anton Altaparmakov, Hin-Tak Leung,
	Anton Altaparmakov, Al Viro, Christoph Hellwig

Hi Hin-Tak,

On Tue, 2014-04-08 at 03:19 +0100, Hin-Tak Leung wrote:

[snip]

> @@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>  	err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
>  	if (err)
>  		return err;
> +	strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);

Why do you prefer kmalloc instead of kmem_cache_alloc? I suppose that
operations with names are frequent operations. So, it makes sense to use
look-aside cache, from my viewpoint. What do you think?

Maybe it makes sense to define special constants for
(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN) and for
(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN)?

[snip]

> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
> index 4e27edc..cfbb23d 100644
> --- a/fs/hfsplus/xattr.c
> +++ b/fs/hfsplus/xattr.c
> @@ -8,6 +8,7 @@
>  
>  #include "hfsplus_fs.h"
>  #include <linux/posix_acl_xattr.h>
> +#include <linux/nls.h>
>  #include "xattr.h"
>  #include "acl.h"
>  
> @@ -645,8 +646,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>  	struct hfs_find_data fd;
>  	u16 key_len = 0;
>  	struct hfsplus_attr_key attr_key;
> -	char strbuf[HFSPLUS_ATTR_MAX_STRLEN +
> -			XATTR_MAC_OSX_PREFIX_LEN + 1] = {0};

I think that you missed likewise cases in:

(1) hfsplus_attr_build_key()
(http://lxr.free-electrons.com/source/fs/hfsplus/attributes.c#L58)

(2) xattr_security.c
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L17)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L23)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L35)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L41)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L65)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L76)

(3) xattr_trusted.c
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L15)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L21)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L33)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L39)

(4) xattr_user.c
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L15)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L21)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L33)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L39)

(5) hfsplus_osx_getxattr()
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L800)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L807)

(6) hfsplus_osx_setxattr()
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L823)
(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L830)

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08  2:19 [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes Hin-Tak Leung
  2014-04-08  7:05 ` Vyacheslav Dubeyko
@ 2014-04-08 11:56 ` Sergei Antonov
  2014-04-08 15:06 ` Anton Altaparmakov
  2014-04-08 16:59 ` Sergei Antonov
  3 siblings, 0 replies; 23+ messages in thread
From: Sergei Antonov @ 2014-04-08 11:56 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: linux-fsdevel@vger.kernel.org, Anton Altaparmakov, Hin-Tak Leung,
	Anton Altaparmakov, Vyacheslav Dubeyko, Al Viro,
	Christoph Hellwig

On 8 April 2014 04:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:

> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
> is not attainable in the case of conversion to UTF-8, as that
> requires the use of surrogate pairs, i.e. consuming two storage units.

True that 6 is not attainable, wrong that it is with surrogate pairs.
A surrogate pair encodes code-points from U+10000 to U+10FFFF, which
is 4 bytes in UTF-8 (a moderate 2 per one UTF-16 code-unit).

Multiplier 3 is enough for all cases of UTF-16/UCS-2 to UTF-8 conversion.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 13:37 Hin-Tak Leung
  0 siblings, 0 replies; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 13:37 UTC (permalink / raw)
  To: slava; +Cc: linux-fsdevel, aia21, anton, viro, hch

------------------------------
On Tue, Apr 8, 2014 8:05 AM BST Vyacheslav Dubeyko wrote:

>Hi Hin-Tak,
>
>On Tue, 2014-04-08 at 03:19 +0100, Hin-Tak Leung wrote:
>
>[snip]
>
> @@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>      err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
>      if (err)
>          return err;
> +    strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
>
>Why do you prefer kmalloc instead of kmem_cache_alloc? I suppose that
>operations with names are frequent operations. So, it makes sense to use
>look-aside cache, from my viewpoint. What do you think?
>

I just wasn't sure which *alloc does what :-). Even kmalloc vs kzalloc vz kcalloc
took a while to find. (and kcalloc isn't the kernel equivalent of calloc, kzalloc is).

Any reference to which *alloc does what?

>Maybe it makes sense to define special constants for
>(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN) and for
>(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN)?
>
>[snip]
>
> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
> index 4e27edc..cfbb23d 100644
> --- a/fs/hfsplus/xattr.c
> +++ b/fs/hfsplus/xattr.c
> @@ -8,6 +8,7 @@
>  
>  #include "hfsplus_fs.h"
>  #include <linux/posix_acl_xattr.h>
> +#include <linux/nls.h>
>  #include "xattr.h"
>  #include "acl.h"
>  
> @@ -645,8 +646,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>      struct hfs_find_data fd;
>      u16 key_len = 0;
>      struct hfsplus_attr_key attr_key;
> -    char strbuf[HFSPLUS_ATTR_MAX_STRLEN +
> -            XATTR_MAC_OSX_PREFIX_LEN + 1] = {0};
>
>I think that you missed likewise cases in:

I did not miss them - my patch specifically addresses what comes out
of hfsplus_uni2asc() (maybe I should be clearer about that).
There are only two usages of it, and that's what my patch does.

I had a look at the other usages of HFSPLUS_ATTR_MAX_STRLEN
- some of them are about setting (instead of getting) attributes.
(and quite strangely, some of them says "this is not used" - in one version
of kernel source I check out).

I just think the other usages of HFSPLUS_ATTR_MAX_STRLEN needs
to be addressed by another patch.

So at least my thought is to address the problem with  hfsplus_uni2asc() in
one patch, and other usage of HFSPLUS_ATTR_MAX_STRLEN in another.
Otherwise, we'd have to drop the attribute part of the patch and bundle
that with the attribute related changes, as they are rather large, and also,
as I said, it is not obvious which is get and which is set.

So perhaps I should ask the question - there are only one read-off-diskroutine and one
write-to-disk routine for names (and wrappers around those), which is why it is
quite quick and obvious which one needs to be fixed for names.
But for attributes, again there is *one* place where hfsplus_uni2asc() is used,
but there seems to be a whole lot of other things going on.

In any case, many of them also needs to switch from stack allocation
to heap allocation (and error propagation for those!). It needs to be in a different
patch.

Hin-Tak

>(1) hfsplus_attr_build_key()
>(http://lxr.free-electrons.com/source/fs/hfsplus/attributes.c#L58)
>
>(2) xattr_security.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L17)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L23)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L35)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L41)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L65)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L76)
>
>(3) xattr_trusted.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L15)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L21)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L33)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L39)
>
>(4) xattr_user.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L15)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L21)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L33)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L39)
>
>(5) hfsplus_osx_getxattr()
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L800)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L807)
>
>(6) hfsplus_osx_setxattr()
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L823)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L830)
>
>Thanks,
>Vyacheslav Dubeyko.
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 14:08 Hin-Tak Leung
  2014-04-08 14:41 ` Sergei Antonov
  0 siblings, 1 reply; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 14:08 UTC (permalink / raw)
  To: saproj; +Cc: linux-fsdevel, aia21, anton, slava, viro, hch

------------------------------
On Tue, Apr 8, 2014 12:56 PM BST Sergei Antonov wrote:

>On 8 April 2014 04:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>
>> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
>> is not attainable in the case of conversion to UTF-8, as that
>> requires the use of surrogate pairs, i.e. consuming two storage units.
>
>True that 6 is not attainable, wrong that it is with surrogate pairs.
>A surrogate pair encodes code-points from U+10000 to U+10FFFF, which
>is 4 bytes in UTF-8 (a moderate 2 per one UTF-16 code-unit).
>
>Multiplier 3 is enough for all cases of UTF-16/UCS-2 to UTF-8 conversion.

No. The part of the commit message you skipped, specifically mentioned that
conversion to a GB18030 locale can require x4. x3 is not enough. The sentence
is just a "BTW, the value of 6 can not 'usually' happen within this...".

I only put in UTF-8 there, because it is widely used. It is entirely possible for
a UTF16-BE -> "some-hill-billy-inbreeding-language-that-only-a-few-people-in-the-world-speak"
conversion scheme to hit 6.

Sigh. Remember what I said earlier about arguing about words and meaning
of words not being constructive? The code does the correct thing. I could re-word
the commit message, and/or delete that paragraph, or modifying "... the use of
surrogate pairs *and other means of encoding the higher planes*, i.e. consuming
*more than* two storage units...". and you could probably go on about what
"other means of encoding" is.

You can probably also go on about why it should be "two",
"more than two", but not "more than or equal to two". That few words
in the commit message really does not make any real difference,
and it is also quite unconstructive to argue about the meaning
of a few words, out of context.

Hin-Tak

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 14:08 Hin-Tak Leung
@ 2014-04-08 14:41 ` Sergei Antonov
  0 siblings, 0 replies; 23+ messages in thread
From: Sergei Antonov @ 2014-04-08 14:41 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: linux-fsdevel@vger.kernel.org, Anton Altaparmakov,
	Anton Altaparmakov, Vyacheslav Dubeyko, Al Viro,
	Christoph Hellwig

On 8 April 2014 16:08, Hin-Tak Leung <htl10@users.sourceforge.net> wrote:
> ------------------------------
> On Tue, Apr 8, 2014 12:56 PM BST Sergei Antonov wrote:
>
>>On 8 April 2014 04:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>>
>>> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
>>> is not attainable in the case of conversion to UTF-8, as that
>>> requires the use of surrogate pairs, i.e. consuming two storage units.
>>
>>True that 6 is not attainable, wrong that it is with surrogate pairs.
>>A surrogate pair encodes code-points from U+10000 to U+10FFFF, which
>>is 4 bytes in UTF-8 (a moderate 2 per one UTF-16 code-unit).
>>
>>Multiplier 3 is enough for all cases of UTF-16/UCS-2 to UTF-8 conversion.
>
> No. The part of the commit message you skipped, specifically mentioned that
> conversion to a GB18030 locale can require x4. x3 is not enough. The sentence
> is just a "BTW, the value of 6 can not 'usually' happen within this...".

My statement explicitly concerns UTF-8. GB18030 is not UTF-8.

> I only put in UTF-8 there, because it is widely used. It is entirely possible for
> a UTF16-BE -> "some-hill-billy-inbreeding-language-that-only-a-few-people-in-the-world-speak"
> conversion scheme to hit 6.
>
> Sigh. Remember what I said earlier about arguing about words and meaning
> of words not being constructive? The code does the correct thing. I could re-word
> the commit message, and/or delete that paragraph, or modifying "... the use of
> surrogate pairs *and other means of encoding the higher planes*, i.e. consuming
> *more than* two storage units...". and you could probably go on about what
> "other means of encoding" is.
>
> You can probably also go on about why it should be "two",
> "more than two", but not "more than or equal to two". That few words
> in the commit message really does not make any real difference,
> and it is also quite unconstructive to argue about the meaning
> of a few words, out of context.

Didn't I understand the quoted text right? I understood it as this:
attaining 6 requires the use of surrogate pairs. That's what I
refuted. No offence. It is just a correction. And yes, IMO errors in
patch description count.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 14:42 Hin-Tak Leung
  0 siblings, 0 replies; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 14:42 UTC (permalink / raw)
  To: slava; +Cc: linux-fsdevel, aia21, anton, viro, hch

------------------------------
On Tue, Apr 8, 2014 8:05 AM BST Vyacheslav Dubeyko wrote:

<snipped>
>I think that you missed likewise cases in:

About those bunch of files, I think it needs a 2nd patch. The choice is just between
whether to put the attribute-related uni2asc part of the fix with the uni2asc
name fix, or put it together with this 2nd one.

85+ CJK-character file names are quite conceivable (255 / 3 = 85). attributes
in HFS+ are new, and usage of them - and usage with with long and exotic charsets 
are probably still rare; so there is different urgency to the two.

Also this whole bunch is rather a different matter - as I said, some are
about setting long attributes. Not being able to read long attributes (written
by a different OS, very likely), is quite a different matter from not being
about to write long attributes (you try to write, it doesn't work, hopefully
you get a message saying it is too long, and you try something else,
shorten and abbreviate your attribute, etc).

Also, none of them are directly hooked up to uni2asc, so there are probably
a 3rd item besides expanding the buffer size, error recovery from failed
allocation: error recovery of these from the bottom listattr (I think that's the one
that actually use uni2asc) feeding them something too large
for their liking.

Hin-Tak

>(1) hfsplus_attr_build_key()
>(http://lxr.free-electrons.com/source/fs/hfsplus/attributes.c#L58)
>
>(2) xattr_security.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L17)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L23)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L35)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L41)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L65)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_security.c#L76)
>
>(3) xattr_trusted.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L15)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L21)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L33)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_trusted.c#L39)
>
>(4) xattr_user.c
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L15)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L21)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L33)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr_user.c#L39)
>
>(5) hfsplus_osx_getxattr()
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L800)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L807)
>
>(6) hfsplus_osx_setxattr()
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L823)
>(http://lxr.free-electrons.com/source/fs/hfsplus/xattr.c#L830)
>
>Thanks,
>Vyacheslav Dubeyko.
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08  2:19 [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes Hin-Tak Leung
  2014-04-08  7:05 ` Vyacheslav Dubeyko
  2014-04-08 11:56 ` Sergei Antonov
@ 2014-04-08 15:06 ` Anton Altaparmakov
  2014-04-08 16:15   ` Vyacheslav Dubeyko
  2014-04-08 16:59 ` Sergei Antonov
  3 siblings, 1 reply; 23+ messages in thread
From: Anton Altaparmakov @ 2014-04-08 15:06 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: linux-fsdevel, Hin-Tak Leung, Vyacheslav Dubeyko, Al Viro,
	Christoph Hellwig

Hi Hin-Tak,

That now looks good.

I agree with you that doing a major cleanup of how attribute names are handled is a separate patch.  As you say that is much more work and it is not as trivial as this.  I think doing the attribute handling properly requires reworking how it all fits together because it is imposing the name length checks at the wrong level - they need to be done after conversion to UTF-16 not whilst they are still in the current NLS (except of course you want to check that they are not too big to fit in the buffer though it is not at all clear to me why the argument is being copied into a buffer in the first place at least in one of the places I looked at - seems a waste of CPU time to me).

Also, I don't think it is needed to set up a kmem cache for this - kmalloc is perfect - as at any one time there will be a vanishingly small number of these buffers allocated at any one time.  After all, how many concurrent readdir() calls are there going to be on a volume at any one time?  It is not like the struct inode or other long lived memory structures where you end up with thousands of them in memory at any one time...

Just one thing I wanted to mention in case you did not know it: kfree(NULL) is allowed - in fact kfree() function starts by doing "if (NULL argument) return;" so in your patch you could instead of adding an additional exit label, just use the generic one that also does a kfree() which is safe as the pointer is NULL.  As this is only error handling code path it does not matter that you incur an additional function call overhead.

Whether you want to change the exit label or not, you can change the CC: to Reviewed-By: for me if you like and I would suggest you send it to Andrew Morton for inclusion through his patch series.

Best regards,

	Anton

On 8 Apr 2014, at 03:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:

> From: Hin-Tak Leung <htl10@users.sourceforge.net>
> 
> The HFS Plus Volume Format specification (TN1150) states that
> file names are stored internally as a maximum of 255 unicode
> characters, as defined by The Unicode Standard, Version 2.0
> [Unicode, Inc. ISBN 0-201-48345-9]. File names are converted by
> the NLS system on Linux before presented to the user.
> 
> 255 CJK characters converts to UTF-8 with 1 unicode character
> to up to 3 bytes, and to GB18030 with 1 unicode character to
> up to 4 bytes. Thus, trying in a UTF-8 locale to list files
> with names of more than 85 CJK characters results in:
> 
>    $ ls /mnt
>    ls: reading directory /mnt: File name too long
> 
> The receiving buffer needs to be 255 x NLS_MAX_CHARSET_SIZE bytes,
> not 255 bytes as the code has always been.
> 
> Similar consideration applies to attributes, which are stored
> internally as a maximum of 127 UTF-16BE units in HFS+.
> 
> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
> is not attainable in the case of conversion to UTF-8, as that
> requires the use of surrogate pairs, i.e. consuming two storage units.
> 
> Thanks Anton Altaparmakov for reviewing an earlier version of
> this change.
> 
> Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
> CC: Anton Altaparmakov <anton@tuxera.com>
> CC: Vyacheslav Dubeyko <slava@dubeyko.com>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> CC: Christoph Hellwig <hch@infradead.org>
> ---
> fs/hfsplus/dir.c   | 12 ++++++++++--
> fs/hfsplus/xattr.c | 15 ++++++++++++---
> 2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
> index bdec665..f95abef 100644
> --- a/fs/hfsplus/dir.c
> +++ b/fs/hfsplus/dir.c
> @@ -12,6 +12,7 @@
> #include <linux/fs.h>
> #include <linux/slab.h>
> #include <linux/random.h>
> +#include <linux/nls.h>
> 
> #include "hfsplus_fs.h"
> #include "hfsplus_raw.h"
> @@ -127,7 +128,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> 	struct inode *inode = file_inode(file);
> 	struct super_block *sb = inode->i_sb;
> 	int len, err;
> -	char strbuf[HFSPLUS_MAX_STRLEN + 1];
> +	char *strbuf;
> 	hfsplus_cat_entry entry;
> 	struct hfs_find_data fd;
> 	struct hfsplus_readdir_data *rd;
> @@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> 	err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
> 	if (err)
> 		return err;
> +	strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
> +	if (!strbuf) {
> +		err = -ENOMEM;
> +		goto failed_strbuf_alloc;
> +	}
> 	hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL);
> 	err = hfs_brec_find(&fd, hfs_find_rec_by_key);
> 	if (err)
> @@ -193,7 +199,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
> 		hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
> 			fd.entrylength);
> 		type = be16_to_cpu(entry.type);
> -		len = HFSPLUS_MAX_STRLEN;
> +		len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
> 		err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
> 		if (err)
> 			goto out;
> @@ -246,6 +252,8 @@ next:
> 	}
> 	memcpy(&rd->key, fd.key, sizeof(struct hfsplus_cat_key));
> out:
> +	kfree(strbuf);
> +failed_strbuf_alloc:
> 	hfs_find_exit(&fd);
> 	return err;
> }
> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
> index 4e27edc..cfbb23d 100644
> --- a/fs/hfsplus/xattr.c
> +++ b/fs/hfsplus/xattr.c
> @@ -8,6 +8,7 @@
> 
> #include "hfsplus_fs.h"
> #include <linux/posix_acl_xattr.h>
> +#include <linux/nls.h>
> #include "xattr.h"
> #include "acl.h"
> 
> @@ -645,8 +646,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> 	struct hfs_find_data fd;
> 	u16 key_len = 0;
> 	struct hfsplus_attr_key attr_key;
> -	char strbuf[HFSPLUS_ATTR_MAX_STRLEN +
> -			XATTR_MAC_OSX_PREFIX_LEN + 1] = {0};
> +	char *strbuf;
> 	int xattr_name_len;
> 
> 	if ((!S_ISREG(inode->i_mode) &&
> @@ -666,6 +666,13 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> 		return err;
> 	}
> 
> +	strbuf = kzalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN +
> +			XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL);
> +	if (!strbuf) {
> +		res = -ENOMEM;
> +		goto failed_strbuf_alloc;
> +	}
> +
> 	err = hfsplus_find_attr(inode->i_sb, inode->i_ino, NULL, &fd);
> 	if (err) {
> 		if (err == -ENOENT) {
> @@ -692,7 +699,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> 		if (be32_to_cpu(attr_key.cnid) != inode->i_ino)
> 			goto end_listxattr;
> 
> -		xattr_name_len = HFSPLUS_ATTR_MAX_STRLEN;
> +		xattr_name_len = NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN;
> 		if (hfsplus_uni2asc(inode->i_sb,
> 			(const struct hfsplus_unistr *)&fd.key->attr.key_name,
> 					strbuf, &xattr_name_len)) {
> @@ -718,6 +725,8 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
> 	}
> 
> end_listxattr:
> +	kfree(strbuf);
> +failed_strbuf_alloc:
> 	hfs_find_exit(&fd);
> 	return res;
> }
> -- 
> 1.9.0

-- 
Anton Altaparmakov <anton at tuxera.com> (replace at with @)
Senior Kernel Engineer, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 16:07 Hin-Tak Leung
  0 siblings, 0 replies; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 16:07 UTC (permalink / raw)
  To: anton; +Cc: linux-fsdevel, slava, viro, hch

Hi Anton,

------------------------------
On Tue, Apr 8, 2014 4:06 PM BST Anton Altaparmakov wrote:

>Hi Hin-Tak,
>
>That now looks good.
>
>I agree with you that doing a major cleanup of how attribute names are handled is a separate patch.  As you say that is much more work and it is not as trivial as this.  I think doing the attribute handling properly requires reworking how it all fits together because it is imposing the name length checks at the wrong level - they need to be done after conversion to UTF-16 not whilst they are still in the current NLS (except of course you want to check that they are not too big to fit in the buffer though it is not at all clear to me why the argument is being copied into a buffer in the first place at least in one of the places I looked at - seems a waste of CPU time to me).
>

The bigger attribute change part has a mixture of prefixes (in bytes and nls local strings), and also moving/concatenating within the buffer, it needs a lot more thinking and work. But I think just to address what comes out of hfsplus_uni2asc() (from on disk to in-memory), the patch is all that is needed.

For the copying of argument, if you are referring to the len being passed into uni2asc - it is being used as both a limit and also a return value for the actual number of byte used.

>Also, I don't think it is needed to set up a kmem cache for this - kmalloc is perfect - as at any one time there will be a vanishingly small number of these buffers allocated at any one time.  After all, how many concurrent readdir() calls are there going to be on a volume at any one time?  It is not like the struct inode or other long lived memory structures where you end up with thousands of them in memory at any one time...
>

I haven't been able to find any reference to which to use (grep'ing under Documentation/* ) - there are a lot of *alloc in the kernel . Even finding kzalloc does what calloc in userland (and kcalloc is a totally different beast) took some effort.

The attribute ones seem to need kzalloc (or, just '\0' at the beginning to null terminate if empty) - they are being used in concatenating, etc. As i said, cleaning that up will be a lot more work.

>Just one thing I wanted to mention in case you did not know it: kfree(NULL) is allowed - in fact kfree() function starts by doing "if (NULL argument) return;" so in your patch you could instead of adding an additional exit label, just use the generic one that also does a kfree() which is safe as the pointer is NULL.  As this is only error handling code path it does not matter that you incur an additional function call overhead.
>

Thanks. I did have the impression and google'd to confirm that a change to make it work that way happened in 2005/2006. I have played with various ways of re-arranging the code, specifically whether strbuf should go before or after find_init, and how they could fail (independently), but in the end, the problem is simply that only one of them can do "if fail then return err". The other will have to clean up the opposite party before the returning. There might be future addition where the order is important.

>Whether you want to change the exit label or not, you can change the CC: to Reviewed-By: for me if you like and I would suggest you send it to Andrew Morton for inclusion through his patch series.

Thanks. I'll possibly change a few words in the commit message. Let's see if others have any more to say.

Hin-Tak

>On 8 Apr 2014, at 03:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
>
>> From: Hin-Tak Leung <htl10@users.sourceforge.net>
>> 
>> The HFS Plus Volume Format specification (TN1150) states that
>> file names are stored internally as a maximum of 255 unicode
>> characters, as defined by The Unicode Standard, Version 2.0
>> [Unicode, Inc. ISBN 0-201-48345-9]. File names are converted by
>> the NLS system on Linux before presented to the user.
>> 
>> 255 CJK characters converts to UTF-8 with 1 unicode character
>> to up to 3 bytes, and to GB18030 with 1 unicode character to
>> up to 4 bytes. Thus, trying in a UTF-8 locale to list files
>> with names of more than 85 CJK characters results in:
>> 
>>    $ ls /mnt
>>    ls: reading directory /mnt: File name too long
>> 
>> The receiving buffer needs to be 255 x NLS_MAX_CHARSET_SIZE bytes,
>> not 255 bytes as the code has always been.
>> 
>> Similar consideration applies to attributes, which are stored
>> internally as a maximum of 127 UTF-16BE units in HFS+.
>> 
>> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6
>> is not attainable in the case of conversion to UTF-8, as that
>> requires the use of surrogate pairs, i.e. consuming two storage units.
>> 
>> Thanks Anton Altaparmakov for reviewing an earlier version of
>> this change.
>> 
>> Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
>> CC: Anton Altaparmakov <anton@tuxera.com>
>> CC: Vyacheslav Dubeyko <slava@dubeyko.com>
>> CC: Al Viro <viro@zeniv.linux.org.uk>
>> CC: Christoph Hellwig <hch@infradead.org>
>> ---
>> fs/hfsplus/dir.c   | 12 ++++++++++--
>> fs/hfsplus/xattr.c | 15 ++++++++++++---
>> 2 files changed, 22 insertions(+), 5 deletions(-)
>> 
>> diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
>> index bdec665..f95abef 100644
>> --- a/fs/hfsplus/dir.c
>> +++ b/fs/hfsplus/dir.c
>> @@ -12,6 +12,7 @@
>> #include <linux/fs.h>
>> #include <linux/slab.h>
>> #include <linux/random.h>
>> +#include <linux/nls.h>
>> 
>> #include "hfsplus_fs.h"
>> #include "hfsplus_raw.h"
>> @@ -127,7 +128,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>>     struct inode *inode = file_inode(file);
>>     struct super_block *sb = inode->i_sb;
>>     int len, err;
>> -    char strbuf[HFSPLUS_MAX_STRLEN + 1];
>> +    char *strbuf;
>>     hfsplus_cat_entry entry;
>>     struct hfs_find_data fd;
>>     struct hfsplus_readdir_data *rd;
>> @@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>>     err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
>>     if (err)
>>         return err;
>> +    strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);
>> +    if (!strbuf) {
>> +        err = -ENOMEM;
>> +        goto failed_strbuf_alloc;
>> +    }
>>     hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL);
>>     err = hfs_brec_find(&fd, hfs_find_rec_by_key);
>>     if (err)
>> @@ -193,7 +199,7 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>>         hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
>>             fd.entrylength);
>>         type = be16_to_cpu(entry.type);
>> -        len = HFSPLUS_MAX_STRLEN;
>> +        len = NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN;
>>         err = hfsplus_uni2asc(sb, &fd.key->cat.name, strbuf, &len);
>>         if (err)
>>             goto out;
>> @@ -246,6 +252,8 @@ next:
>>     }
>>     memcpy(&rd->key, fd.key, sizeof(struct hfsplus_cat_key));
>> out:
>> +    kfree(strbuf);
>> +failed_strbuf_alloc:
>>     hfs_find_exit(&fd);
>>     return err;
>> }
>> diff --git a/fs/hfsplus/xattr.c b/fs/hfsplus/xattr.c
>> index 4e27edc..cfbb23d 100644
>> --- a/fs/hfsplus/xattr.c
>> +++ b/fs/hfsplus/xattr.c
>> @@ -8,6 +8,7 @@
>> 
>> #include "hfsplus_fs.h"
>> #include <linux/posix_acl_xattr.h>
>> +#include <linux/nls.h>
>> #include "xattr.h"
>> #include "acl.h"
>> 
>> @@ -645,8 +646,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>     struct hfs_find_data fd;
>>     u16 key_len = 0;
>>     struct hfsplus_attr_key attr_key;
>> -    char strbuf[HFSPLUS_ATTR_MAX_STRLEN +
>> -            XATTR_MAC_OSX_PREFIX_LEN + 1] = {0};
>> +    char *strbuf;
>>     int xattr_name_len;
>> 
>>     if ((!S_ISREG(inode->i_mode) &&
>> @@ -666,6 +666,13 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>         return err;
>>     }
>> 
>> +    strbuf = kzalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN +
>> +            XATTR_MAC_OSX_PREFIX_LEN + 1, GFP_KERNEL);
>> +    if (!strbuf) {
>> +        res = -ENOMEM;
>> +        goto failed_strbuf_alloc;
>> +    }
>> +
>>     err = hfsplus_find_attr(inode->i_sb, inode->i_ino, NULL, &fd);
>>     if (err) {
>>         if (err == -ENOENT) {
>> @@ -692,7 +699,7 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>         if (be32_to_cpu(attr_key.cnid) != inode->i_ino)
>>             goto end_listxattr;
>> 
>> -        xattr_name_len = HFSPLUS_ATTR_MAX_STRLEN;
>> +        xattr_name_len = NLS_MAX_CHARSET_SIZE * HFSPLUS_ATTR_MAX_STRLEN;
>>         if (hfsplus_uni2asc(inode->i_sb,
>>             (const struct hfsplus_unistr *)&fd.key->attr.key_name,
>>                     strbuf, &xattr_name_len)) {
>> @@ -718,6 +725,8 @@ ssize_t hfsplus_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>     }
>> 
>> end_listxattr:
>> +    kfree(strbuf);
>> +failed_strbuf_alloc:
>>     hfs_find_exit(&fd);
>>     return res;
>> }
>> -- 
>> 1.9.0
>
>-- 
>Anton Altaparmakov <anton at tuxera.com> (replace at with @)
>Senior Kernel Engineer, Tuxera Inc., http://www.tuxera.com/
>Linux NTFS maintainer
>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 15:06 ` Anton Altaparmakov
@ 2014-04-08 16:15   ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-08 16:15 UTC (permalink / raw)
  To: Anton Altaparmakov
  Cc: Hin-Tak Leung, linux-fsdevel, Hin-Tak Leung, Al Viro,
	Christoph Hellwig

On Tue, 2014-04-08 at 16:06 +0100, Anton Altaparmakov wrote:
> Hi Hin-Tak,
> 
> That now looks good.
> 
> I agree with you that doing a major cleanup of how attribute names are
> handled is a separate patch.  As you say that is much more work and it
> is not as trivial as this.  I think doing the attribute handling
> properly requires reworking how it all fits together because it is
> imposing the name length checks at the wrong level - they need to be
> done after conversion to UTF-16 not whilst they are still in the
> current NLS (except of course you want to check that they are not too
> big to fit in the buffer though it is not at all clear to me why the
> argument is being copied into a buffer in the first place at least in
> one of the places I looked at - seems a waste of CPU time to me).
> 
Yes, it is possible to split the work on several patches. I agree with
it.

But I suppose that these patches needs to be joined in one patchset.
Because reported issue can be solved by fix as for file names as for
xattr names.

Of course, it is possible to fix the issue with file names, firstly, and
then the issue with xattr names. But it doesn't make sense to mix fixes
for file names and for xattr names in one patch. Because partial fix in
hfsplus_listxattr() doesn't fix the issue for xattr names.

> Also, I don't think it is needed to set up a kmem cache for this -
> kmalloc is perfect - as at any one time there will be a vanishingly
> small number of these buffers allocated at any one time.  After all,
> how many concurrent readdir() calls are there going to be on a volume
> at any one time?  It is not like the struct inode or other long lived
> memory structures where you end up with thousands of them in memory at
> any one time...

Yes, kmalloc can be enough for readdir() case. But I suppose that it
needs to use kmem cache for the case of xattr names. It will be more
efficient solution.

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 16:43 Hin-Tak Leung
  2014-04-08 16:55 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 16:43 UTC (permalink / raw)
  To: slava, anton; +Cc: hintak.leung, linux-fsdevel, viro, hch

------------------------------
On Tue, Apr 8, 2014 5:15 PM BST Vyacheslav Dubeyko wrote:

>On Tue, 2014-04-08 at 16:06 +0100, Anton Altaparmakov wrote:
>> Hi Hin-Tak,
>> 
>> That now looks good.
>> 
>> I agree with you that doing a major cleanup of how attribute names are
>> handled is a separate patch.  As you say that is much more work and it
>> is not as trivial as this.  I think doing the attribute handling
>> properly requires reworking how it all fits together because it is
>> imposing the name length checks at the wrong level - they need to be
>> done after conversion to UTF-16 not whilst they are still in the
>> current NLS (except of course you want to check that they are not too
>> big to fit in the buffer though it is not at all clear to me why the
>> argument is being copied into a buffer in the first place at least in
>> one of the places I looked at - seems a waste of CPU time to me).
>> 
>Yes, it is possible to split the work on several patches. I agree with
>it.
>
>But I suppose that these patches needs to be joined in one patchset.
>Because reported issue can be solved by fix as for file names as for
>xattr names.
>
>Of course, it is possible to fix the issue with file names, firstly, and
>then the issue with xattr names. But it doesn't make sense to mix fixes
>for file names and for xattr names in one patch. Because partial fix in
>hfsplus_listxattr() doesn't fix the issue for xattr names.
>

Think of it as "fixes worst-case usage of hfsplus_uni2asc()",
then it makes sense to have this as one patch. Allowing full usage
of getting and setting of long multi-lingual attributes, and checking
other uses of HFSPLUS_ATTR_MAX_STRLEN really need to be another
(multiple others).

There are only two usages of hfsplus_uni2asc(), one for names and
one for attributes.

>> Also, I don't think it is needed to set up a kmem cache for this -
>> kmalloc is perfect - as at any one time there will be a vanishingly
>> small number of these buffers allocated at any one time.  After all,
>> how many concurrent readdir() calls are there going to be on a volume
>> at any one time?  It is not like the struct inode or other long lived
>> memory structures where you end up with thousands of them in memory at
>> any one time...
>
>Yes, kmalloc can be enough for readdir() case. But I suppose that it
>needs to use kmem cache for the case of xattr names. It will be more
>efficient solution.
>



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 16:43 Hin-Tak Leung
@ 2014-04-08 16:55 ` Vyacheslav Dubeyko
  2014-04-08 17:09   ` Anton Altaparmakov
  0 siblings, 1 reply; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-08 16:55 UTC (permalink / raw)
  To: htl10; +Cc: anton, hintak.leung, linux-fsdevel, viro, hch


On Apr 8, 2014, at 8:43 PM, Hin-Tak Leung wrote:

[snip]

>> Yes, it is possible to split the work on several patches. I agree with
>> it.
>> 
>> But I suppose that these patches needs to be joined in one patchset.
>> Because reported issue can be solved by fix as for file names as for
>> xattr names.
>> 
>> Of course, it is possible to fix the issue with file names, firstly, and
>> then the issue with xattr names. But it doesn't make sense to mix fixes
>> for file names and for xattr names in one patch. Because partial fix in
>> hfsplus_listxattr() doesn't fix the issue for xattr names.
>> 
> 
> Think of it as "fixes worst-case usage of hfsplus_uni2asc()",
> then it makes sense to have this as one patch. Allowing full usage
> of getting and setting of long multi-lingual attributes, and checking
> other uses of HFSPLUS_ATTR_MAX_STRLEN really need to be another
> (multiple others).
> 
> There are only two usages of hfsplus_uni2asc(), one for names and
> one for attributes.

Unfortunately, it doesn't make sense to have fix only in hfsplus_listxattr().
It fixes nothing for xattr names. Of course, you can mention name of the patch. :)
But such partial fix doesn't make sense for me. It will be more clear way to have
one patch for file names and another patch for xattr names.

Thanks,
Vyacheslav Dubeyko.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08  2:19 [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes Hin-Tak Leung
                   ` (2 preceding siblings ...)
  2014-04-08 15:06 ` Anton Altaparmakov
@ 2014-04-08 16:59 ` Sergei Antonov
  3 siblings, 0 replies; 23+ messages in thread
From: Sergei Antonov @ 2014-04-08 16:59 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: linux-fsdevel@vger.kernel.org, Anton Altaparmakov, Hin-Tak Leung,
	Anton Altaparmakov, Vyacheslav Dubeyko, Al Viro,
	Christoph Hellwig

On 8 April 2014 04:19, Hin-Tak Leung <hintak.leung@gmail.com> wrote:
> @@ -139,6 +140,11 @@ static int hfsplus_readdir(struct file *file, struct dir_context *ctx)
>         err = hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
>         if (err)
>                 return err;
> +       strbuf = kmalloc(NLS_MAX_CHARSET_SIZE * HFSPLUS_MAX_STRLEN + 1, GFP_KERNEL);

Exceeds 80 ch. limit. Break the line after comma.
Use "scripts/checkpatch.pl <fname>" to catch formatting errors like this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 16:55 ` Vyacheslav Dubeyko
@ 2014-04-08 17:09   ` Anton Altaparmakov
  2014-04-08 17:21     ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 23+ messages in thread
From: Anton Altaparmakov @ 2014-04-08 17:09 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: htl10, hintak.leung, linux-fsdevel, viro, hch

Hi,

On 8 Apr 2014, at 17:55, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Apr 8, 2014, at 8:43 PM, Hin-Tak Leung wrote:
> [snip]
>>> Yes, it is possible to split the work on several patches. I agree with
>>> it.
>>> 
>>> But I suppose that these patches needs to be joined in one patchset.
>>> Because reported issue can be solved by fix as for file names as for
>>> xattr names.
>>> 
>>> Of course, it is possible to fix the issue with file names, firstly, and
>>> then the issue with xattr names. But it doesn't make sense to mix fixes
>>> for file names and for xattr names in one patch. Because partial fix in
>>> hfsplus_listxattr() doesn't fix the issue for xattr names.
>>> 
>> 
>> Think of it as "fixes worst-case usage of hfsplus_uni2asc()",
>> then it makes sense to have this as one patch. Allowing full usage
>> of getting and setting of long multi-lingual attributes, and checking
>> other uses of HFSPLUS_ATTR_MAX_STRLEN really need to be another
>> (multiple others).
>> 
>> There are only two usages of hfsplus_uni2asc(), one for names and
>> one for attributes.
> 
> Unfortunately, it doesn't make sense to have fix only in hfsplus_listxattr().
> It fixes nothing for xattr names. Of course, you can mention name of the patch. :)
> But such partial fix doesn't make sense for me. It will be more clear way to have
> one patch for file names and another patch for xattr names.

You are entitled to your opinion but I agree with Hin-Tak.  This makes perfect sense as it fixes the use of a function which can otherwise cause an erroneous EIO error to be returned from listxattr().  This in itself is a worthwhile fix.  Your idea of having to fix everything or nothing at all is frankly quite silly and would mean that hardly anything ever gets fixed...  Fixing small problems individually is just fine, especially when they are such obvious ones.

Best regards,

	Anton

> Thanks,
> Vyacheslav Dubeyko.

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge
J.J. Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 17:09   ` Anton Altaparmakov
@ 2014-04-08 17:21     ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-08 17:21 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: htl10, hintak.leung, linux-fsdevel, viro, hch


On Apr 8, 2014, at 9:09 PM, Anton Altaparmakov wrote:

[snip]

>> 
>> Unfortunately, it doesn't make sense to have fix only in hfsplus_listxattr().
>> It fixes nothing for xattr names. Of course, you can mention name of the patch. :)
>> But such partial fix doesn't make sense for me. It will be more clear way to have
>> one patch for file names and another patch for xattr names.
> 
> You are entitled to your opinion but I agree with Hin-Tak.  This makes perfect sense as it fixes the use of a function which can otherwise cause an erroneous EIO error to be returned from listxattr().  This in itself is a worthwhile fix.  Your idea of having to fix everything or nothing at all is frankly quite silly and would mean that hardly anything ever gets fixed...  Fixing small problems individually is just fine, especially when they are such obvious ones.
> 

I have another opinion. :) But, of course, it is possible to see on things in
different ways. I don't think that topic of splitting fixes between patches
is really important. But my suggestion has more logical basis.

But, anyway, I suggest to use kmem cache for the case of xattr names
instead of kmalloc.

Thanks,
Vyacheslav Dubeyko.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-08 19:53 Hin-Tak Leung
  2014-04-09  6:46 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-08 19:53 UTC (permalink / raw)
  To: slava, anton; +Cc: linux-fsdevel, viro, hch

------------------------------
On Tue, Apr 8, 2014 6:21 PM BST Vyacheslav Dubeyko wrote:

>
>On Apr 8, 2014, at 9:09 PM, Anton Altaparmakov wrote:
>
>[snip]
>
>> 
>> Unfortunately, it doesn't make sense to have fix only in hfsplus_listxattr().
>> It fixes nothing for xattr names. Of course, you can mention name of the patch. :)
>> But such partial fix doesn't make sense for me. It will be more clear way to have
>> one patch for file names and another patch for xattr names.
>> 
>> You are entitled to your opinion but I agree with Hin-Tak.  This makes perfect sense as it fixes the use of a function which can otherwise cause an erroneous EIO error to be returned from listxattr().  This in itself is a worthwhile fix.  Your idea of having to fix everything or nothing at all is frankly quite silly and would mean that hardly anything ever gets fixed...  Fixing small problems individually is just fine, especially when they are such obvious ones.
>> 
>
>I have another opinion. :) But, of course, it is possible to see on things in
>different ways. I don't think that topic of splitting fixes between patches
>is really important. But my suggestion has more logical basis.
>
>But, anyway, I suggest to use kmem cache for the case of xattr names
>instead of kmalloc.

This patch fix usages of hfsplus_uni2asc(). There are only two.

I think we all agree that there is a lot more work to be done in the attribute side to make sure that
getting and setting long non-english attributes to work. And there will be a series of patches to it.
The discussion is really on how this is split.

There are two uses of HFSPLUS_MAX_STRLEN concern asc2uni(), and
are correct where they are. There is one other use in

super.c:319:	buf->f_namelen = HFSPLUS_MAX_STRLEN;

That may need some thinking. It is probably correct.

For HFSPLUS_ATTR_MAX_STRLEN, besides the one I changed, there are 9 other uses
as size of buffers (and a similar number in checks for length over-run, etc).

As I wrote earlier, some of these are setters, rather than getters. Setters and getters
are different. The setters probably should use a large enough buffer according
to userland, (ie. not a file-system specific one), and let the lower-layer of the file system
returns an error when one is trying to set a too-long attribute. The getters needs
to cope with what listxttr() returns.

So I see there should be at least two forth-coming patches related to attributes.
One can bundle the attribute part of the current patch to the latter; but then that's assuming
that the getter part of these further patches is logically one. It may be complex
enough to require splitting up. We have 9 buffers, divisible at least now into two groups,
getters and setters - it might need to be divided further. There are OS-prefixes
complications along the way, so we may end up having to do 3-4 additional patches.

At this point it is not obvious that half of the current patch should be
deferred, and where/what it should defer to. So I am thinking just "commit early,
commit often".

Retrospectively - when we look back on it afterwards, in a few month's time -,
the whole set of work will form a patch series; but one advantage of *not* declaring
a patch series, is that it allows others to work on it. e.g. It may be important for
Sergei to put his name down on one of these further patches, if claiming a change
under his/her name is an important issue :-).

85+ CJK characters for file names isn't too uncommon. I have non-english
file names on my hard disk with say, 50+ characters.

If you insist on doing Chinese all the way, 127/3 = 42 is really a bit low for a
descriptive string of any usage, but does anybody use non-english extended
attributes on any OSes?

Hin-Tak
 


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-08 19:53 Hin-Tak Leung
@ 2014-04-09  6:46 ` Vyacheslav Dubeyko
  2014-04-09  8:58   ` Anton Altaparmakov
  0 siblings, 1 reply; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-09  6:46 UTC (permalink / raw)
  To: htl10; +Cc: anton, linux-fsdevel, viro, hch

On Tue, 2014-04-08 at 20:53 +0100, Hin-Tak Leung wrote:

[snip]

> 
> If you insist on doing Chinese all the way, 127/3 = 42 is really a bit low for a
> descriptive string of any usage, but does anybody use non-english extended
> attributes on any OSes?
> 

If you think that nobody use non-english extended attributes then it
doesn't make sense to make any fixes in xattr subsystem of HFS+. And
partial code changing in hfsplus_listxattr() is completely useless.
Because getxattr() and setxattr() series of methods will be the primary
source of error in the case of worst-case unicode to char conversion.

Otherwise, you can use any splitting of fixes between patches. In my
last e-mail I suggested to use kmem cache approach for xattr names
instead of kmalloc only.

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-09  6:46 ` Vyacheslav Dubeyko
@ 2014-04-09  8:58   ` Anton Altaparmakov
  2014-04-10  7:04     ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 23+ messages in thread
From: Anton Altaparmakov @ 2014-04-09  8:58 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: htl10, linux-fsdevel, viro, hch

Hi Vyacheslav,

On 9 Apr 2014, at 07:46, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Tue, 2014-04-08 at 20:53 +0100, Hin-Tak Leung wrote:
> [snip]
>> 
>> If you insist on doing Chinese all the way, 127/3 = 42 is really a bit low for a
>> descriptive string of any usage, but does anybody use non-english extended
>> attributes on any OSes?
>> 
> 
> If you think that nobody use non-english extended attributes then it
> doesn't make sense to make any fixes in xattr subsystem of HFS+.

It doesn't matter what anyone things.  It matters to do what is right.  And right is to allow anything that would work on OSX.

> And partial code changing in hfsplus_listxattr() is completely useless.

No it is not at all useless.  It is perfectly valid to run listxattr() without then running getxattr() on all the attributes!  If an application cares about attributes foo and bar and there is also an attribute with a super long name then as things are at the moment, the listxattr() will fail with EIO error and the application will fail badly probably telling you your file system is corrupt.  With the fix from Hin-Tak, the listxattr() will work fine and then the application will work fine because it will only getxattr() foo and bar.  The difference being an application failing due to apparently corrupt file system or broken disk and an application working.  How can you possibly state this fix is useless!?!  Unbelievable...

> Because getxattr() and setxattr() series of methods will be the primary
> source of error in the case of worst-case unicode to char conversion.
> 
> Otherwise, you can use any splitting of fixes between patches. In my
> last e-mail I suggested to use kmem cache approach for xattr names
> instead of kmalloc only.

Yes but I don't think you have given a good reason why it would be worth to create a kmem cache for such temporary buffers and IMO it makes no sense to use them - kmalloc and friends are a better choice in this use case.  Please remember that kmalloc is effectively using a kmem cache - just a set of generic ones (depending on size of allocation - just look at /proc/slabinfo some time) instead of a specialised one so there is no benefit in using a dedicated kmem cache unless you are going to have lots of those buffers around which you are not as they only live for the duration of the get/set/list function call.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge
J.J. Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-09 14:28 Hin-Tak Leung
  2014-04-10  7:23 ` Vyacheslav Dubeyko
  0 siblings, 1 reply; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-09 14:28 UTC (permalink / raw)
  To: slava; +Cc: anton, linux-fsdevel, viro, hch

Hi Vyacheslav,

------------------------------
On Wed, Apr 9, 2014 7:46 AM BST Vyacheslav Dubeyko wrote:

>On Tue, 2014-04-08 at 20:53 +0100, Hin-Tak Leung wrote:
>
>[snip]
>
>> 
>> If you insist on doing Chinese all the way, 127/3 = 42 is really a bit low for a
>> descriptive string of any usage, but does anybody use non-english extended
>> attributes on any OSes?
>> 
>
>If you think that nobody use non-english extended attributes then it
>doesn't make sense to make any fixes in xattr subsystem of HFS+. And
>partial code changing in hfsplus_listxattr() is completely useless.
>Because getxattr() and setxattr() series of methods will be the primary
>source of error in the case of worst-case unicode to char conversion.
>

That was an information-gathering question - and also, once a feature is generally
available on every fs, it probably will be used (and abused). Nobody uses
attributes much ATM probably because there are a few common
fs'es which does not have them (*cough* fat32 *cough*).

Maybe the Chinese or the Koreans would use local namspaces for their 
version of selinux :-).

There are different ways of dividing a series of patches. One can go about
it purpose-based; but then, API-based (a patch for every usage instance of one
function, one type of changes) is valid too, and sometimes cleaner to review.

I have thought of one reason why one might want to split the current patch
up: for cherry-picking/back-porting the long name fix to much older kernels which
do not have the relatively new attribute stuff. But then, it is simple enough
that it isn't too painful to backport by hand, if anybody really want it.

Anyway, there are more patches to come, I am just not insisting on a declaring
a plan for an explicit series. Anybody who wants to put his name
down on further corrections is free to have a go.

>Otherwise, you can use any splitting of fixes between patches. In my
>last e-mail I suggested to use kmem cache approach for xattr names
>instead of kmalloc only.

I looked up what kmem cache does - I am not entirely convinced it would be
faster/better. Remember we are talking about converting
stack allocations which last the duration of 1 routine (the listxattr is longer,
but all the others seem to be quite short), to dynamic allocations.
They are fairly short-duration. kmem cache is more useful with things
like inodes and dentry, which are relatively long lasting uniform sizes,
but comes and goes. It could even be the case that kmem cache is
*slower* in this case because of its small over-heads for exremely temporary
allocations like these. We just don't know yet - or at least I don't know it yet :-).

Hin-Tak


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-09  8:58   ` Anton Altaparmakov
@ 2014-04-10  7:04     ` Vyacheslav Dubeyko
  2014-04-10  9:13       ` Anton Altaparmakov
  0 siblings, 1 reply; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-10  7:04 UTC (permalink / raw)
  To: Anton Altaparmakov; +Cc: htl10, linux-fsdevel, viro, hch

Hi Anton,

On Wed, 2014-04-09 at 09:58 +0100, Anton Altaparmakov wrote:

> > 
> > If you think that nobody use non-english extended attributes then it
> > doesn't make sense to make any fixes in xattr subsystem of HFS+.
> 
> It doesn't matter what anyone things.  It matters to do what is right.  And right is to allow anything that would work on OSX.
> 

Yes, right things should be right. :)

I misunderstand where you've found in my sentence differences with your
statement "to allow anything that would work on OSX". So, I see only
useless objection here. Sorry.

> > And partial code changing in hfsplus_listxattr() is completely useless.
> 
> No it is not at all useless.  It is perfectly valid to run listxattr() without then running getxattr() on all the attributes!  If an application cares about attributes foo and bar and there is also an attribute with a super long name then as things are at the moment, the listxattr() will fail with EIO error and the application will fail badly probably telling you your file system is corrupt.  With the fix from Hin-Tak, the listxattr() will work fine and then the application will work fine because it will only getxattr() foo and bar.  The difference being an application failing due to apparently corrupt file system or broken disk and an application working.  How can you possibly state this fix is useless!?!  Unbelievable...
> 

Yes, it's unbelievable that another guy has another opinion. :) I have
another vision here. Sorry. But I have right to think in other way and
to share my opinion freely. And I don't think that your vision is single
and only way.

> > Because getxattr() and setxattr() series of methods will be the primary
> > source of error in the case of worst-case unicode to char conversion.
> > 
> > Otherwise, you can use any splitting of fixes between patches. In my
> > last e-mail I suggested to use kmem cache approach for xattr names
> > instead of kmalloc only.
> 
> Yes but I don't think you have given a good reason why it would be worth to create a kmem cache for such temporary buffers and IMO it makes no sense to use them - kmalloc and friends are a better choice in this use case.  Please remember that kmalloc is effectively using a kmem cache - just a set of generic ones (depending on size of allocation - just look at /proc/slabinfo some time) instead of a specialised one so there is no benefit in using a dedicated kmem cache unless you are going to have lots of those buffers around which you are not as they only live for the duration of the get/set/list function call.
> 

I see similarity between xattr name buffers and inodes buffers. So, I've
suggested to use kmem cache because of such similarity. As far as I can
judge, you mean that xattr name buffers will live short time. And, as a
result, kmalloc will be much better choice. Do you mean this? Am I
correct? Otherwise, it makes sense to use simple kmalloc and for inode
buffers.

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-09 14:28 Hin-Tak Leung
@ 2014-04-10  7:23 ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 23+ messages in thread
From: Vyacheslav Dubeyko @ 2014-04-10  7:23 UTC (permalink / raw)
  To: htl10; +Cc: anton, linux-fsdevel, viro, hch

Hi Hin-Tak,

On Wed, 2014-04-09 at 15:28 +0100, Hin-Tak Leung wrote:

> 
> There are different ways of dividing a series of patches. One can go about
> it purpose-based; but then, API-based (a patch for every usage instance of one
> function, one type of changes) is valid too, and sometimes cleaner to review.
> 
> I have thought of one reason why one might want to split the current patch
> up: for cherry-picking/back-porting the long name fix to much older kernels which
> do not have the relatively new attribute stuff. But then, it is simple enough
> that it isn't too painful to backport by hand, if anybody really want it.
> 
> Anyway, there are more patches to come, I am just not insisting on a declaring
> a plan for an explicit series. Anybody who wants to put his name
> down on further corrections is free to have a go.
> 

Yes, it is possible to have different splitting fixes between patches. I
simply think that if you mix the fix for file names and partial fix in
hfsplus_listxattr() in one patch then it needs to use patchset scheme.
Because rest fixes for xattr names should be in other patches of the
patchset. Otherwise, it needs to have one patch for file names fix and
other patches for xattr names. Of course, it's only my opinion and I
don't insist on this way. But it's right way for me.

> >Otherwise, you can use any splitting of fixes between patches. In my
> >last e-mail I suggested to use kmem cache approach for xattr names
> >instead of kmalloc only.
> 
> I looked up what kmem cache does - I am not entirely convinced it would be
> faster/better. Remember we are talking about converting
> stack allocations which last the duration of 1 routine (the listxattr is longer,
> but all the others seem to be quite short), to dynamic allocations.
> They are fairly short-duration. kmem cache is more useful with things
> like inodes and dentry, which are relatively long lasting uniform sizes,
> but comes and goes. It could even be the case that kmem cache is
> *slower* in this case because of its small over-heads for exremely temporary
> allocations like these. We just don't know yet - or at least I don't know it yet :-).
> 

But what difference between inode buffers and xattr name buffers? Inode
buffer has fixed size and xatr buffer has fixed size. Inodes and xattr
names comes and goes. Maybe, xattr name buffer has shorter lifetime.
Efficiency is not only performance but good way of memory using. We know
that xattr buffer has fixed size and it can be many simultaneous
requests for such buffers creation. When we create kmem cache then we
share our knowledge about fixed size nature of data with memory
subsystem. Otherwise, memory subsystem will work in more sophisticated
way. Maybe, I am wrong or misunderstand something here. But I see
similarity between inode buffers and xattr name buffers.

Thanks,
Vyacheslav Dubeyko.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
  2014-04-10  7:04     ` Vyacheslav Dubeyko
@ 2014-04-10  9:13       ` Anton Altaparmakov
  0 siblings, 0 replies; 23+ messages in thread
From: Anton Altaparmakov @ 2014-04-10  9:13 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: htl10, linux-fsdevel, viro, hch

Hi Vyacheslav,

On 10 Apr 2014, at 08:04, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
> On Wed, 2014-04-09 at 09:58 +0100, Anton Altaparmakov wrote:
>>> If you think that nobody use non-english extended attributes then it
>>> doesn't make sense to make any fixes in xattr subsystem of HFS+.
>> 
>> It doesn't matter what anyone things.  It matters to do what is right.  And right is to allow anything that would work on OSX.
> 
> Yes, right things should be right. :)
> 
> I misunderstand where you've found in my sentence differences with your
> statement "to allow anything that would work on OSX". So, I see only
> useless objection here. Sorry.

>From what I understood your sentence to mean (and I apologize if I misunderstood you) you said that "unless you have example or knowledge of someone using non-english extended attributes then there is no need to fix this issue on HFS+".  And I think that is very wrong.  Just because we don't have an example is not a good reason not to fix something.  If it works on OSX which it does, then it should work on Linux, too.  No matter whether anyone can provide an example of someone using it or not.  It should be fixed simply because it is wrong and that is something that is obviously the case.  The Linux kernel is not a business where the mentality of "if there is no customer bug report stop using company time on fixing it" is common place.  Here we strive to always be correct.  And what you sa
 id (from what I understood you to mean anyway) was exactly the same as that business mentality of "if no example == customer then don't fix it"...

>>> Because getxattr() and setxattr() series of methods will be the primary
>>> source of error in the case of worst-case unicode to char conversion.
>>> 
>>> Otherwise, you can use any splitting of fixes between patches. In my
>>> last e-mail I suggested to use kmem cache approach for xattr names
>>> instead of kmalloc only.
>> 
>> Yes but I don't think you have given a good reason why it would be worth to create a kmem cache for such temporary buffers and IMO it makes no sense to use them - kmalloc and friends are a better choice in this use case.  Please remember that kmalloc is effectively using a kmem cache - just a set of generic ones (depending on size of allocation - just look at /proc/slabinfo some time) instead of a specialised one so there is no benefit in using a dedicated kmem cache unless you are going to have lots of those buffers around which you are not as they only live for the duration of the get/set/list function call.
> 
> I see similarity between xattr name buffers and inodes buffers. So, I've
> suggested to use kmem cache because of such similarity. As far as I can
> judge, you mean that xattr name buffers will live short time. And, as a
> result, kmalloc will be much better choice. Do you mean this? Am I
> correct? Otherwise, it makes sense to use simple kmalloc and for inode
> buffers.

There is no similarity between inode buffers and xattr name buffers other than the fact that they are both buffers!

xattr name buffer is currently allocated on the stack thus the buffer only exists for the duration of the function and is then thrown away.  Thus at any one point in time there will be usually zero such buffers in memory and even if several processes simultaneously decide to do xattr related operations you would still at any one point in time have a maximum of "number of processes" such buffers allocated in memory.

inode buffers are very long lived buffers, on any typical system there are tens of thousands of them in memory at any one time and they stay in memory for a long time some even the entire time the kernel is running.  On a busy server the number of inode buffer will go into the millions all in memory at same time.

Just the overhead of allocating and maintaining a kmem cache just for those almost never used xattr name buffers will be much larger in terms of memory and CPU use than the buffers you will allocate themselves.  Thus it makes no sense to use kmem cache - just using kmalloc will use the appropriate, generic kmalloc kmem cache.

How can you possibly be comparing these two things and saying they are similar?!?  You are comparing apples to oranges...

>From a completely idle, small server using ext3 file system for / which is 56GiB in size, looking at /proc/slabinfo we can see the ext3 inode kmem cache and the generic 1024-byte kmem cache for kmalloc:

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ext3_inode_cache   15105  15105    816    5    1 : tunables   54   27    8 : slabdata   3021   3021  
size-1024            907    968   1024    4    1 : tunables   54   27    8 : slabdata    242    242      0

So there are 15105 cached "struct inode" on the root file system and at present there are 907 1024-byte sized kmalloc() buffers in memory.

Thus allocating a handful or even 10s or 100s more of 1024-byte size kmalloc() buffers is hardly going to make any difference to anything and fits nicely in what the size-1024 kmem cache is for.

The amount of wasted ram is absolutely minimal as the number of xattr name buffers is so small.

I think you really do not understand how memory allocators work...  )-:

Best regards,

	Anton

> Thanks,
> Vyacheslav Dubeyko.

-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge
J.J. Thomson Avenue, Cambridge, CB3 0RB, UK


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes
@ 2014-04-10 14:42 Hin-Tak Leung
  0 siblings, 0 replies; 23+ messages in thread
From: Hin-Tak Leung @ 2014-04-10 14:42 UTC (permalink / raw)
  To: slava; +Cc: anton, linux-fsdevel, viro, hch

Hi Vyacheslav,

------------------------------
On Thu, Apr 10, 2014 8:23 AM BST Vyacheslav Dubeyko wrote:

<snipped>
>Yes, it is possible to have different splitting fixes between patches. I
>simply think that if you mix the fix for file names and partial fix in
>hfsplus_listxattr() in one patch then it needs to use patchset scheme.
>Because rest fixes for xattr names should be in other patches of the
>patchset. Otherwise, it needs to have one patch for file names fix and
>other patches for xattr names. Of course, it's only my opinion and I
>don't insist on this way. But it's right way for me.

ATM, I am seeing a division of correcting all usages of hfsplus_uni2asc(),
and correcting all usages of HFSPLUS_ATTR_MAX_LENGTH. And I am thinking
the latter needs a new function, a way of calculating the unicode length
of an nls string; so there will be at least 3 patches in total. (I might change
my mind later - but it is at least 3 for now).

The one in hfsplus_listxattr() falls in both categories as a usage of hfsplus_uni2asc()
and also a usage of HFSPLUS_ATTR_MAX_LENGTH, and so can go into either camp.

I can see one argument for your division by-purpose (vs my division by API usage),
which is that older kernels do not have attribute support; but to be honest,
if somebody cares about an older kernel without attribute support to cherry-pick
future patches, they should be capable of split the two parts anyway, and
they probably should update the whole hfsplus module wholesale.

<snipped>
>But what difference between inode buffers and xattr name buffers? Inode
>buffer has fixed size and xatr buffer has fixed size. Inodes and xattr
>names comes and goes. Maybe, xattr name buffer has shorter lifetime.
>Efficiency is not only performance but good way of memory using. We know
>that xattr buffer has fixed size and it can be many simultaneous
>requests for such buffers creation. When we create kmem cache then we
>share our knowledge about fixed size nature of data with memory
>subsystem. Otherwise, memory subsystem will work in more sophisticated
>way. Maybe, I am wrong or misunderstand something here. But I see
>similarity between inode buffers and xattr name buffers.

Yes, I think you may be misunderstanding what kmem_cache is supposed
to be used for, from my reading. (I could be wrong, of course). kmem_cache
is a way of avoiding fragmentation: You want to have a lot of long-lasting objects of
the same size, a good example is inodes. There are a lot of them at any one time, but
they come and go; If they are allocated from the generic, it could cause
contiguous memory to be broken down into smaller pieces over time, as
you allocate and free these fixed sized objects, *among objects of other sizes*.
So to avoid fragmentation, you pre-allocate a "pool" of a number of slots
of these fixed size, and just allocate and free from this pool instead.

So this assumes (1) you want to have *many* of these same size objects
at any one time, but they are allocated/freed all the time, (2) you want to
avoid breaking up larger chunk of contiguous memory just to allocate
these smaller ones.

There is an overhead for setting up the pool (and the expanding/shrinking of
the pool), the tearing down of the pool, etc. Nobody is talkng about speed
vs just plain kmalloc here - maybe it is faster, maybe it is slower, but the
purpose is to avoid memory fragmentation over time.

Neither of these are true for those currently stack-based xattr buffers.
They are short-lived (stack poped at the end of the routine) - so any breaking up
will be soon coalescenced to neighbouring free regions
(spelling? "joined back together" after use, I mean),
and also, you are unlikely to be calling more than a few copies of getattr/settattr
simultaneously, and unlikely to need many of these buffer simultaneously.

Does this sounds reasonable? I have only been doing a few hours of
reading up on this; Anton's messages are fairly informative if on the
brief side - one just google' away the rest of it.

Hin-Tak





^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-04-10 14:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-08  2:19 [PATCH V2] hfsplus: fixes worst-case unicode to char conversion of file names and attributes Hin-Tak Leung
2014-04-08  7:05 ` Vyacheslav Dubeyko
2014-04-08 11:56 ` Sergei Antonov
2014-04-08 15:06 ` Anton Altaparmakov
2014-04-08 16:15   ` Vyacheslav Dubeyko
2014-04-08 16:59 ` Sergei Antonov
  -- strict thread matches above, loose matches on Subject: below --
2014-04-08 13:37 Hin-Tak Leung
2014-04-08 14:08 Hin-Tak Leung
2014-04-08 14:41 ` Sergei Antonov
2014-04-08 14:42 Hin-Tak Leung
2014-04-08 16:07 Hin-Tak Leung
2014-04-08 16:43 Hin-Tak Leung
2014-04-08 16:55 ` Vyacheslav Dubeyko
2014-04-08 17:09   ` Anton Altaparmakov
2014-04-08 17:21     ` Vyacheslav Dubeyko
2014-04-08 19:53 Hin-Tak Leung
2014-04-09  6:46 ` Vyacheslav Dubeyko
2014-04-09  8:58   ` Anton Altaparmakov
2014-04-10  7:04     ` Vyacheslav Dubeyko
2014-04-10  9:13       ` Anton Altaparmakov
2014-04-09 14:28 Hin-Tak Leung
2014-04-10  7:23 ` Vyacheslav Dubeyko
2014-04-10 14:42 Hin-Tak Leung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).