* Re: ceph and ext4 [not found] ` <CAO47_-9nfO32nwOAQMVdqw6iWGWK+FWb+MD=WeOz_vr4YrrKmw@mail.gmail.com> @ 2011-11-15 14:22 ` Ted Ts'o 2011-11-15 16:43 ` Andreas Dilger 0 siblings, 1 reply; 16+ messages in thread From: Ted Ts'o @ 2011-11-15 14:22 UTC (permalink / raw) To: Christian Brunner; +Cc: Martin Mailand, ceph-devel, linux-ext4 On Tue, Nov 15, 2011 at 09:29:28AM +0100, Christian Brunner wrote: > > I'd love to have a working filesystem for ceph, but the xattr issue > with ext4 really scares me. That's why I'm still wrestling with btrfs. FYI, there are patches to extend ext4 to store xattr's in a separate inode. They were submitted by the Whamcloud (formerly Oracle, Sun, Clustrefs -- the Lustre folks). Unfortunately I haven't had the personal time and review bandwidth to go over the patches, and they haven't been rebased in a while. I'm not opposed to them; they just need some time and loving attention from folks with ext4 expertise to accelerate the process of getting the patches reviewed, and in mainline. - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2011-11-15 14:22 ` ceph and ext4 Ted Ts'o @ 2011-11-15 16:43 ` Andreas Dilger 2011-11-15 18:43 ` Yehuda Sadeh Weinraub ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Andreas Dilger @ 2011-11-15 16:43 UTC (permalink / raw) To: Ted Ts'o Cc: Christian Brunner, Martin Mailand, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. This will allow xattrs up to 64kB in size (in total) to be stored as efficiently as an external xattr block. Cheers, Andreas On 2011-11-15, at 6:22, Ted Ts'o <tytso@mit.edu> wrote: > On Tue, Nov 15, 2011 at 09:29:28AM +0100, Christian Brunner wrote: >> >> I'd love to have a working filesystem for ceph, but the xattr issue >> with ext4 really scares me. That's why I'm still wrestling with btrfs. > > FYI, there are patches to extend ext4 to store xattr's in a separate > inode. They were submitted by the Whamcloud (formerly Oracle, Sun, > Clustrefs -- the Lustre folks). > > Unfortunately I haven't had the personal time and review bandwidth to > go over the patches, and they haven't been rebased in a while. I'm > not opposed to them; they just need some time and loving attention > from folks with ext4 expertise to accelerate the process of getting > the patches reviewed, and in mainline. > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2011-11-15 16:43 ` Andreas Dilger @ 2011-11-15 18:43 ` Yehuda Sadeh Weinraub 2011-11-21 12:22 ` Increase xattr space by allocating contiguous xattr blocks Yu Jian 2011-12-08 22:59 ` ceph and ext4 Christian Brunner 2 siblings, 0 replies; 16+ messages in thread From: Yehuda Sadeh Weinraub @ 2011-11-15 18:43 UTC (permalink / raw) To: Ted Ts'o Cc: Andreas Dilger, Christian Brunner, Martin Mailand, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu On Tue, Nov 15, 2011 at 8:43 AM, Andreas Dilger <adilger@whamcloud.com> wrote: > Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. > > This will allow xattrs up to 64kB in size (in total) to be stored as efficiently as an external xattr block. > Ted, is having that is a hard requirement? Can this functionality/optimization be added later so that we can leverage ext4 in the mean time, even if not in the most optimal way? Thanks, Yehuda ^ permalink raw reply [flat|nested] 16+ messages in thread
* Increase xattr space by allocating contiguous xattr blocks 2011-11-15 16:43 ` Andreas Dilger 2011-11-15 18:43 ` Yehuda Sadeh Weinraub @ 2011-11-21 12:22 ` Yu Jian 2011-11-21 15:08 ` Theodore Tso 2011-12-08 22:59 ` ceph and ext4 Christian Brunner 2 siblings, 1 reply; 16+ messages in thread From: Yu Jian @ 2011-11-21 12:22 UTC (permalink / raw) To: tytso; +Cc: Andreas Dilger, linux-ext4@vger.kernel.org, Yu Jian Hello Ted, I'm working on the patches for large xattrs support again based on the previous patch made by Kalpak Shah. There are some small issues in the previous patch, I'd fix them and rebase the patch on the latest ext4 codes. Per the thread of http://kerneltrap.org/mailarchive/linux-ext4/2009/2/9/4931244, there are two features need to be implemented: 1) store large xattr value (with value_size > blocksize) in an external xattr inode 2) increase xattr space by allocating a single chunk of up to 64kB (in size) contiguous xattr blocks The first one has been implemented in the previous patch, and I'm working on the second one, which would handle "medium-sized" xattrs more efficiently. Now, I've the same question as that in the above thread: In xattr.{h,c}, all of the macros and functions assume the xattr space is contiguous with entries growing downwards and values growing upwards (aligned to the end of the space). Especially, the create, replace, remove and shift operations of xattrs are all performed inside a contiguous buffer. This is no problem with in-inode xattr space and single external xattr block which is associated with one block buffer. But for multiple xattr blocks, since the data of them would be read into different block buffers, which are not contiguous, most of the existing macros and functions need to be changed. Is this way acceptable? In order to make most of the codes remain as-is, we could allocate a contiguous large buffer (up to 64kB in size) to handle all of the data. However, we have to memcpy the data from block buffers to the large buffer, and after the data are changed, we need memcpy the data back to block buffers to make the data written into the block device. Is this way reasonable? Could you please give me some suggestions on how to solve this issue? Thanks! -- Best regards, Yu Jian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Increase xattr space by allocating contiguous xattr blocks 2011-11-21 12:22 ` Increase xattr space by allocating contiguous xattr blocks Yu Jian @ 2011-11-21 15:08 ` Theodore Tso 2011-11-22 3:32 ` Yu Jian 2011-11-22 4:53 ` Eric Sandeen 0 siblings, 2 replies; 16+ messages in thread From: Theodore Tso @ 2011-11-21 15:08 UTC (permalink / raw) To: Yu Jian; +Cc: Theodore Tso, Andreas Dilger, linux-ext4@vger.kernel.org On Nov 21, 2011, at 7:22 AM, Yu Jian wrote: > Now, I've the same question as that in the above thread: > In xattr.{h,c}, all of the macros and functions assume the xattr space is contiguous with entries growing downwards and values growing upwards (aligned to the end of the space). Especially, the create, replace, remove and shift operations of xattrs are all performed inside a contiguous buffer. This is no problem with in-inode xattr space and single external xattr block which is associated with one block buffer. But for multiple xattr blocks, since the data of them would be read into different block buffers, which are not contiguous, most of the existing macros and functions need to be changed. Is this way acceptable? It depends on how cleanly you can implement it, and if you can create some better xfstests to exercise xattr operations --- especially if deleting xattrs will cause existing xattrs to get moved around for better packing, we need to be absolutely sure that the code is completely reliable and doesn't end up corrupting some xattr other than the one that is being inserted, deleted, or replaced. Currently, one of the primary xattr tests that is in xfstests (#62) doesn't work on ext4 at all, since it assumes that files are returned by readdir in file creation order for newly created directory. So the lack of test coverage is something that would have to be addressed if we want to do major surgery to the xattr code. I'd suggest creating a new series of test from scratch, since I don't believe test #62 can be easily reworked so that it will work under ext4. > In order to make most of the codes remain as-is, we could allocate a contiguous large buffer (up to 64kB in size) to handle all of the data. However, we have to memcpy the data from block buffers to the large buffer, and after the data are changed, we need memcpy the data back to block buffers to make the data written into the block device. Is this way reasonable? This is no doubt the simpler way to go. The downsides of doing this are pretty obvious: overhead to do the copy, the extra memory pressure, the need to do the memory allocation (vmalloc is slow, since it requires messing with page tables; if you need to count on contiguous free pages, you may end up stalling while you wait for the pressure on the mm defrag routines). Also, if there are people who want to do large amounts of xattr operations on PCIe attached flash, the extra overhead of doing the copy will definitely show up on the benchmarks. The first is approach is no doubt he better one, but it at the same time it would be tricker to implement. If it would be totally up to me, I'd suggest the first approach, but it would need to be approached with care (and a lot of testing). Regards, -- Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Increase xattr space by allocating contiguous xattr blocks 2011-11-21 15:08 ` Theodore Tso @ 2011-11-22 3:32 ` Yu Jian 2011-11-22 4:53 ` Eric Sandeen 1 sibling, 0 replies; 16+ messages in thread From: Yu Jian @ 2011-11-22 3:32 UTC (permalink / raw) To: Theodore Tso; +Cc: Andreas Dilger, linux-ext4@vger.kernel.org On 11/21/11 11:08 PM, Theodore Tso wrote: > > On Nov 21, 2011, at 7:22 AM, Yu Jian wrote: > >> Now, I've the same question as that in the above thread: >> In xattr.{h,c}, all of the macros and functions assume the xattr space is contiguous with entries growing downwards and values growing upwards (aligned to the end of the space). Especially, the create, replace, remove and shift operations of xattrs are all performed inside a contiguous buffer. This is no problem with in-inode xattr space and single external xattr block which is associated with one block buffer. But for multiple xattr blocks, since the data of them would be read into different block buffers, which are not contiguous, most of the existing macros and functions need to be changed. Is this way acceptable? > > It depends on how cleanly you can implement it, and if you can create some better xfstests to exercise xattr operations --- especially if deleting xattrs will cause existing xattrs to get moved around for better packing, we need to be absolutely sure that the code is completely reliable and doesn't end up corrupting some xattr other than the one that is being inserted, deleted, or replaced. > > Currently, one of the primary xattr tests that is in xfstests (#62) doesn't work on ext4 at all, since it assumes that files are returned by readdir in file creation order for newly created directory. So the lack of test coverage is something that would have to be addressed if we want to do major surgery to the xattr code. I'd suggest creating a new series of test from scratch, since I don't believe test #62 can be easily reworked so that it will work under ext4. > >> In order to make most of the codes remain as-is, we could allocate a contiguous large buffer (up to 64kB in size) to handle all of the data. However, we have to memcpy the data from block buffers to the large buffer, and after the data are changed, we need memcpy the data back to block buffers to make the data written into the block device. Is this way reasonable? > > This is no doubt the simpler way to go. The downsides of doing this are pretty obvious: overhead to do the copy, the extra memory pressure, the need to do the memory allocation (vmalloc is slow, since it requires messing with page tables; if you need to count on contiguous free pages, you may end up stalling while you wait for the pressure on the mm defrag routines). Also, if there are people who want to do large amounts of xattr operations on PCIe attached flash, the extra overhead of doing the copy will definitely show up on the benchmarks. > > The first is approach is no doubt he better one, but it at the same time it would be tricker to implement. If it would be totally up to me, I'd suggest the first approach, but it would need to be approached with care (and a lot of testing). Thanks Ted for the suggestions. I'd use the first approach. > > Regards, > > -- Ted > > -- Best regards, Yu Jian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Increase xattr space by allocating contiguous xattr blocks 2011-11-21 15:08 ` Theodore Tso 2011-11-22 3:32 ` Yu Jian @ 2011-11-22 4:53 ` Eric Sandeen 2011-11-23 17:09 ` [PATCH] xfstests: Sort recursive getfattr output in 062 Eric Sandeen 1 sibling, 1 reply; 16+ messages in thread From: Eric Sandeen @ 2011-11-22 4:53 UTC (permalink / raw) To: Theodore Tso; +Cc: Yu Jian, Andreas Dilger, linux-ext4@vger.kernel.org -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/21/11 9:08 AM, Theodore Tso wrote: > Currently, one of the primary xattr tests that is in xfstests (#62) > doesn't work on ext4 at all, since it assumes that files are returned > by readdir in file creation order for newly created directory. So > the lack of test coverage is something that would have to be > addressed if we want to do major surgery to the xattr code. I'd > suggest creating a new series of test from scratch, since I don't > believe test #62 can be easily reworked so that it will work under > ext4. Are you sure? # grep supported 062 _supported_fs generic Hm, but it doesn't actually work does it. :/ Maybe we could do find | sort | xargs ... but I guess it's explicitly supposed to test the getfattr recursive code. - -Eric -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJOyyrdAAoJECCuFpLhPd7gjqwP/2ZMeQedGBvsmPLvxlD/oej+ dZ/U8DSV1kKo7A+2Su7LSB04LuuxrQ97AJBsVTfPyqxB8PFiQ+YQMvkqWwO4xEPO abhr5Y0e9s+L/aOQ40UWQQE+X4LLrHOjakv0b4oXJgor7piuCRMhLiT1lejzml72 Vg6bYNoQ0uDeNZSJyjRscexg14/uJNz052gQ6QDz7jKZpYPbD5EJr0y63i9MadbM H8do7kvQOzuIg4UtPOlt9ce52pstihsCL4rGrKZNghDFbmqnSN2SLVx74QyphXYc T1QqxqxYIiuM2QF6wiZP7mXvk7w1OOyBczOxXGWW7WBrlfg+eD2uW93yiCxgpIPD 1Ljbo+dbXRlVGTEb/k9nrkn/KQjU/MqNkfImGVtjZLK3hEOcbWuh9ojEGoSfKnfx LQ1V6Jqb4VcFxP55Nkrl8ROCp4FahorMWFVRl3hh/lSBZP7h3ogtzE9GCBQfygRP n0ZPBDOJ2H87L1q5NGnCqkqtwY05dTuVQZD6xdlV8Z/eogOSdVTu4/Icb4k8a6cK H98X7u+1iVy1meDqtqrvOOOgq0jss3ZvVl7vBToo7OsInhlbDHHXY0fbxBAmfguo 1KP5cMl069KPY7WeydXGjynh2nrYh1F7BGCWp0a93REcg7NTl6E7XONuT4E0ohsO oPX9piyNcZzgYHDOqfon =HL4p -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] xfstests: Sort recursive getfattr output in 062 2011-11-22 4:53 ` Eric Sandeen @ 2011-11-23 17:09 ` Eric Sandeen 2011-11-28 11:03 ` Christoph Hellwig 2012-01-25 21:38 ` [PATCH V2] " Eric Sandeen 0 siblings, 2 replies; 16+ messages in thread From: Eric Sandeen @ 2011-11-23 17:09 UTC (permalink / raw) To: Theodore Tso; +Cc: Yu Jian, Andreas Dilger, linux-ext4@vger.kernel.org, xfs-oss -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Test 062 was made "generic" a while back, but it fails on any filesystem which returns getfattr -R results (aka readdir results) in something other than inode-order. With a little awk-fu we can sort the records from getfattr -R so that the output is the same for xfs as well as ext4, etc. Also filter out lost+found which extN creates at mkfs time, but some other filesystems do not. Signed-off-by: Eric Sandeen <sandeen@redhat.com> - --- diff --git a/062 b/062 index f666e1b..7005c4e 100755 - --- a/062 +++ b/062 @@ -46,6 +46,13 @@ _cleanup() } trap "_cleanup; exit \$status" 0 1 2 3 15 +# getfattr -R returns info in readdir order which varies from fs to fs. +# This sorts the output by filename +_sort_getfattr_output() +{ + awk '{a[FNR]=$0}END{n = asort(a); for(i=1; i <= n; i++) print a[i],"\n"}' RS='' +} + getfattr() { $GETFATTR_PROG --absolute-names -dh $@ 2>&1 | _filter_scratch @@ -67,7 +74,7 @@ _create_test_bed() mknod $SCRATCH_MNT/dev/c c 0 0 mknod $SCRATCH_MNT/dev/p p # sanity check - - find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch + find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch | grep -v "lost+found" } # real QA test starts here @@ -160,18 +167,18 @@ _extend_test_bed() # whack a symlink in the middle, just to be difficult ln -s $SCRATCH_MNT/here/up $SCRATCH_MNT/descend/and # dump out our new starting point - - find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch + find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch | grep -v "lost+found" } _extend_test_bed echo echo "*** directory descent with us following symlinks" - -getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT +getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT | _sort_getfattr_output echo echo "*** directory descent without following symlinks" - -getfattr -h -P -R -m '.' -e hex $SCRATCH_MNT +getfattr -h -P -R -m '.' -e hex $SCRATCH_MNT | _sort_getfattr_output # diff --git a/062.out b/062.out index 699254a..7d05c85 100644 - --- a/062.out +++ b/062.out @@ -508,115 +508,115 @@ SCRATCH_MNT/lnk SCRATCH_MNT/reg *** directory descent with us following symlinks - -# file: SCRATCH_MNT/reg - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface +# file: SCRATCH_MNT/descend +user.1=0x3233 +user.x=0x797a - -# file: SCRATCH_MNT/dir - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface +# file: SCRATCH_MNT/descend/and/ascend +trusted.9=0x3837 +trusted.a=0x6263 - -# file: SCRATCH_MNT/lnk - -trusted.name=0xbabe - -trusted.name3=0xdeface +# file: SCRATCH_MNT/descend/down +user.1=0x3233 +user.x=0x797a + +# file: SCRATCH_MNT/descend/down/here +user.1=0x3233 +user.x=0x797a # file: SCRATCH_MNT/dev/b trusted.name=0xbabe - -trusted.name3=0xdeface +trusted.name3=0xdeface # file: SCRATCH_MNT/dev/c trusted.name=0xbabe - -trusted.name3=0xdeface +trusted.name3=0xdeface # file: SCRATCH_MNT/dev/p trusted.name=0xbabe +trusted.name3=0xdeface + +# file: SCRATCH_MNT/dir +trusted.name=0xbabe trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface # file: SCRATCH_MNT/here trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 # file: SCRATCH_MNT/here/up trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 # file: SCRATCH_MNT/here/up/ascend trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 +# file: SCRATCH_MNT/lnk +trusted.name=0xbabe +trusted.name3=0xdeface + +# file: SCRATCH_MNT/reg +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface + + +*** directory descent without following symlinks # file: SCRATCH_MNT/descend user.1=0x3233 - -user.x=0x797a +user.x=0x797a # file: SCRATCH_MNT/descend/down user.1=0x3233 - -user.x=0x797a +user.x=0x797a # file: SCRATCH_MNT/descend/down/here user.1=0x3233 - -user.x=0x797a - - - -# file: SCRATCH_MNT/descend/and/ascend - -trusted.9=0x3837 - -trusted.a=0x6263 - - - - - -*** directory descent without following symlinks - -# file: SCRATCH_MNT/reg - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface - - - -# file: SCRATCH_MNT/dir - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface - - - -# file: SCRATCH_MNT/lnk - -trusted.name=0xbabe - -trusted.name3=0xdeface +user.x=0x797a # file: SCRATCH_MNT/dev/b trusted.name=0xbabe - -trusted.name3=0xdeface +trusted.name3=0xdeface # file: SCRATCH_MNT/dev/c trusted.name=0xbabe - -trusted.name3=0xdeface +trusted.name3=0xdeface # file: SCRATCH_MNT/dev/p trusted.name=0xbabe +trusted.name3=0xdeface + +# file: SCRATCH_MNT/dir +trusted.name=0xbabe trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface # file: SCRATCH_MNT/here trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 # file: SCRATCH_MNT/here/up trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 # file: SCRATCH_MNT/here/up/ascend trusted.9=0x3837 - -trusted.a=0x6263 +trusted.a=0x6263 - -# file: SCRATCH_MNT/descend - -user.1=0x3233 - -user.x=0x797a - - - -# file: SCRATCH_MNT/descend/down - -user.1=0x3233 - -user.x=0x797a +# file: SCRATCH_MNT/lnk +trusted.name=0xbabe +trusted.name3=0xdeface - -# file: SCRATCH_MNT/descend/down/here - -user.1=0x3233 - -user.x=0x797a +# file: SCRATCH_MNT/reg +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJOzSjQAAoJECCuFpLhPd7gsJIP/j2BKTvft/QjnLxs+gJT/f0S H7L9r/izHcQ5Z/xk9t4unpOr35EsXOuwMzWN8OrB3fCx0J7gkIj7s7dNFrou8K8w bkVW6lzy5VUOHPzkqFSNLHSf2YLrrXNfPBLgOQl8A4lkNzY4gjy74uw05bLL4z3y u+gmjwuOMRTwGe3N9l/q6VR7UybHRyYBE62Ee2SPBa59FFWdicqzQBGVj0OOAdQg asTekASb8fXv/A4GNMFyNpZtA65ov8puISeGjV/A9Dhrx843qn1IFLA3UfcWEPEf qF5MDkg0ZuDDzN8YOiZM5S6mY7KAGSiKAHlF9nNZistCMJy4UDWh2lGLCIDEoO8l oRpbEOQy8tq7v7ppgp4kmh+8aIj0bletjZbgumPiu8KrRNCyXlBpGYGsBIPH+wgn P0i7EKILkF7gRM6Gs4U7Ek88mLcDSrsRGjI7JWQgOufUoo9noZM4pBlfw+ngx3DC NlhxisqyWcwDAVV1MBNvoSNGB5iLNCTgb/Ppgb7SdwNJ/heX9EAgKTsTnyENjtYe ILcuWf0CLYRPG9gun3JIXS/IibJY4Eqi3E9jo+Et5q6eOGTFuD0hjgERULUzH9d2 wwWFq7KPtdJJbWK0mANn2HK0GQH79gskmWsFS8Mxl7JCSMfPOFdbHNSbqdzNKXJS +OIS7v24/L/pNQNtuqEt =hW9Q -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] xfstests: Sort recursive getfattr output in 062 2011-11-23 17:09 ` [PATCH] xfstests: Sort recursive getfattr output in 062 Eric Sandeen @ 2011-11-28 11:03 ` Christoph Hellwig 2012-01-25 21:38 ` [PATCH V2] " Eric Sandeen 1 sibling, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2011-11-28 11:03 UTC (permalink / raw) To: Eric Sandeen Cc: Theodore Tso, linux-ext4@vger.kernel.org, Yu Jian, Andreas Dilger, xfs-oss On Wed, Nov 23, 2011 at 11:09:37AM -0600, Eric Sandeen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Test 062 was made "generic" a while back, but it fails on any filesystem > which returns getfattr -R results (aka readdir results) in something > other than inode-order. > > With a little awk-fu we can sort the records from getfattr -R so that > the output is the same for xfs as well as ext4, etc. > > Also filter out lost+found which extN creates at mkfs time, but > some other filesystems do not. Looks fine to me, except that I'd put the sorting helper into common.attr. Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH V2] xfstests: Sort recursive getfattr output in 062 2011-11-23 17:09 ` [PATCH] xfstests: Sort recursive getfattr output in 062 Eric Sandeen 2011-11-28 11:03 ` Christoph Hellwig @ 2012-01-25 21:38 ` Eric Sandeen 2012-01-27 10:53 ` Christoph Hellwig 1 sibling, 1 reply; 16+ messages in thread From: Eric Sandeen @ 2012-01-25 21:38 UTC (permalink / raw) To: Eric Sandeen Cc: Theodore Tso, linux-ext4@vger.kernel.org, Yu Jian, Andreas Dilger, xfs-oss -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Test 062 was made "generic" a while back, but it fails on any filesystem which returns getfattr -R results (aka readdir results) in something other than inode-order. With a little awk-fu we can sort the records from getfattr -R so that the output is the same for xfs as well as ext4, etc. Also filter out lost+found which extN creates at mkfs time, but some other filesystems do not. Signed-off-by: Eric Sandeen <sandeen@redhat.com> - --- V2: move sorting function into common.attr diff --git a/062 b/062 index f666e1b..9800e33 100755 - --- a/062 +++ b/062 @@ -67,7 +67,7 @@ _create_test_bed() mknod $SCRATCH_MNT/dev/c c 0 0 mknod $SCRATCH_MNT/dev/p p # sanity check - - find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch + find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch | grep -v "lost+found" } # real QA test starts here @@ -160,18 +160,18 @@ _extend_test_bed() # whack a symlink in the middle, just to be difficult ln -s $SCRATCH_MNT/here/up $SCRATCH_MNT/descend/and # dump out our new starting point - - find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch + find $SCRATCH_MNT | LC_COLLATE=POSIX sort | _filter_scratch | grep -v "lost+found" } _extend_test_bed echo echo "*** directory descent with us following symlinks" - -getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT +getfattr -h -L -R -m '.' -e hex $SCRATCH_MNT | _sort_getfattr_output echo echo "*** directory descent without following symlinks" - -getfattr -h -P -R -m '.' -e hex $SCRATCH_MNT +getfattr -h -P -R -m '.' -e hex $SCRATCH_MNT | _sort_getfattr_output # diff --git a/062.out b/062.out index 699254a..8cc3c65 100644 - --- a/062.out +++ b/062.out @@ -508,21 +508,21 @@ SCRATCH_MNT/lnk SCRATCH_MNT/reg *** directory descent with us following symlinks - -# file: SCRATCH_MNT/reg - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface +# file: SCRATCH_MNT/descend +user.1=0x3233 +user.x=0x797a - -# file: SCRATCH_MNT/dir - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface +# file: SCRATCH_MNT/descend/and/ascend +trusted.9=0x3837 +trusted.a=0x6263 - -# file: SCRATCH_MNT/lnk - -trusted.name=0xbabe - -trusted.name3=0xdeface +# file: SCRATCH_MNT/descend/down +user.1=0x3233 +user.x=0x797a + +# file: SCRATCH_MNT/descend/down/here +user.1=0x3233 +user.x=0x797a # file: SCRATCH_MNT/dev/b trusted.name=0xbabe @@ -536,6 +536,12 @@ trusted.name3=0xdeface trusted.name=0xbabe trusted.name3=0xdeface +# file: SCRATCH_MNT/dir +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface + # file: SCRATCH_MNT/here trusted.9=0x3837 trusted.a=0x6263 @@ -548,6 +554,18 @@ trusted.a=0x6263 trusted.9=0x3837 trusted.a=0x6263 +# file: SCRATCH_MNT/lnk +trusted.name=0xbabe +trusted.name3=0xdeface + +# file: SCRATCH_MNT/reg +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface + + +*** directory descent without following symlinks # file: SCRATCH_MNT/descend user.1=0x3233 user.x=0x797a @@ -560,28 +578,6 @@ user.x=0x797a user.1=0x3233 user.x=0x797a - -# file: SCRATCH_MNT/descend/and/ascend - -trusted.9=0x3837 - -trusted.a=0x6263 - - - - - -*** directory descent without following symlinks - -# file: SCRATCH_MNT/reg - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface - - - -# file: SCRATCH_MNT/dir - -trusted.name=0xbabe - -trusted.name3=0xdeface - -user.name=0xbabe - -user.name3=0xdeface - - - -# file: SCRATCH_MNT/lnk - -trusted.name=0xbabe - -trusted.name3=0xdeface - - # file: SCRATCH_MNT/dev/b trusted.name=0xbabe trusted.name3=0xdeface @@ -594,6 +590,12 @@ trusted.name3=0xdeface trusted.name=0xbabe trusted.name3=0xdeface +# file: SCRATCH_MNT/dir +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface + # file: SCRATCH_MNT/here trusted.9=0x3837 trusted.a=0x6263 @@ -606,17 +608,15 @@ trusted.a=0x6263 trusted.9=0x3837 trusted.a=0x6263 - -# file: SCRATCH_MNT/descend - -user.1=0x3233 - -user.x=0x797a - - - -# file: SCRATCH_MNT/descend/down - -user.1=0x3233 - -user.x=0x797a +# file: SCRATCH_MNT/lnk +trusted.name=0xbabe +trusted.name3=0xdeface - -# file: SCRATCH_MNT/descend/down/here - -user.1=0x3233 - -user.x=0x797a +# file: SCRATCH_MNT/reg +trusted.name=0xbabe +trusted.name3=0xdeface +user.name=0xbabe +user.name3=0xdeface diff --git a/common.attr b/common.attr index 3e2ba85..b49d990 100644 - --- a/common.attr +++ b/common.attr @@ -176,6 +176,13 @@ _require_attrs() rm -f $TEST_DIR/syscalltest.out } +# getfattr -R returns info in readdir order which varies from fs to fs. +# This sorts the output by filename +_sort_getfattr_output() +{ + awk '{a[FNR]=$0}END{n = asort(a); for(i=1; i <= n; i++) print a[i]"\n"}' RS='' +} + # set maximum total attr space based on fs type if [ "$FSTYP" == "xfs" -o "$FSTYP" == "udf" ]; then MAX_ATTRS=1000 -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJPIHZtAAoJECCuFpLhPd7g2yMP/1VU2RhD8gdfpJIQKGvZjurP XGmv2tLY/3eDWTwbpm3E5m3EmXIAZpq+ahUYT9C+I8afw0oLv4zggAQ3M6P20JlM dM/M7imNHYfafIMr7wIw0+Iv0pt/tpDv3cm2kxJDJq+yW88Y+tRG/ifU8uSjREyX WBSqKjo3mJ0CJ03Chh9xWDK7AV4reYW3cxlCNMMzUrRPho7hWIrZN58GhDO4esqa x11eFp4qk6I5ApExfpa5PJIGTT7xepCsQsDSaf2HNKzSEg3c5fzV43nHUFl+qRNH FQ0CPx10VMhxtr3RZI7laN01yHk5qWjMFB9ufldot6Xv/CFwU++zBVH21IwKAgvR D/tihuxUf8Sllpa0+DuMTuobg2cZuvlyshkWeU4YGAbyFfxirPNVjqEcnfnIMnoA +C3Sc/8DTd70C8WG9M5ON9sMmGtJr/s93OAZMwiLHGvBC1GZf51BvAI+l7ln3Ug3 jws77r4x1P01da6ciUsESNibCrilCiw3Q2XcYJU/VfbxpFZ0ro0ib3wA4O654vqN V38zh7BOipn7ukWIn8KRc09d9X/PqsqeFNVaLJA+tMsxH021J8uluG4nTmn3a+jM t79YFK0JccMk8t0ShCTUjhnhHXwdrVVs8MCCghc9QoYg2/HEQrcDPKft69hM9UTL OrK3+QKMf3JATAYdgXSg =w2HJ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH V2] xfstests: Sort recursive getfattr output in 062 2012-01-25 21:38 ` [PATCH V2] " Eric Sandeen @ 2012-01-27 10:53 ` Christoph Hellwig 0 siblings, 0 replies; 16+ messages in thread From: Christoph Hellwig @ 2012-01-27 10:53 UTC (permalink / raw) To: Eric Sandeen Cc: Eric Sandeen, Andreas Dilger, linux-ext4@vger.kernel.org, Theodore Tso, Yu Jian, xfs-oss On Wed, Jan 25, 2012 at 03:38:54PM -0600, Eric Sandeen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Test 062 was made "generic" a while back, but it fails on any filesystem > which returns getfattr -R results (aka readdir results) in something > other than inode-order. > > With a little awk-fu we can sort the records from getfattr -R so that > the output is the same for xfs as well as ext4, etc. > > Also filter out lost+found which extN creates at mkfs time, but > some other filesystems do not. > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > - --- > > V2: move sorting function into common.attr Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2011-11-15 16:43 ` Andreas Dilger 2011-11-15 18:43 ` Yehuda Sadeh Weinraub 2011-11-21 12:22 ` Increase xattr space by allocating contiguous xattr blocks Yu Jian @ 2011-12-08 22:59 ` Christian Brunner 2011-12-09 22:24 ` Andreas Dilger 2 siblings, 1 reply; 16+ messages in thread From: Christian Brunner @ 2011-12-08 22:59 UTC (permalink / raw) To: Andreas Dilger Cc: ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu 2011/11/15 Andreas Dilger <adilger@whamcloud.com>: > Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. > > This will allow xattrs up to 64kB in size (in total) to be stored as efficiently as an external xattr block. > Did you make progress with this. I'm still having serious trouble with btrfs and would like to try these. Thanks, Christian ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2011-12-08 22:59 ` ceph and ext4 Christian Brunner @ 2011-12-09 22:24 ` Andreas Dilger 2012-02-12 4:05 ` Yehuda Sadeh Weinraub 0 siblings, 1 reply; 16+ messages in thread From: Andreas Dilger @ 2011-12-09 22:24 UTC (permalink / raw) To: chb; +Cc: ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu On 2011-12-08, at 3:59 PM, Christian Brunner wrote: > 2011/11/15 Andreas Dilger <adilger@whamcloud.com>: >> Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. > > Did you make progress with this. I'm still having serious trouble with > btrfs and would like to try these. The latest patches are available at http://review.whamcloud.com/1708, but are based on the RHEL6.1 2.6.32 kernel. The work to implement "medium-sized" xattrs was more complex than anticipated, and is not finished yet. The use of external inode xattrs is working, which allows xattr sizes up to 64kB. The 64kB limit is imposed by the VFS and could potentially be increased. Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2011-12-09 22:24 ` Andreas Dilger @ 2012-02-12 4:05 ` Yehuda Sadeh Weinraub 2012-02-12 17:18 ` Andreas Dilger 0 siblings, 1 reply; 16+ messages in thread From: Yehuda Sadeh Weinraub @ 2012-02-12 4:05 UTC (permalink / raw) To: Andreas Dilger Cc: chb, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu On Fri, Dec 9, 2011 at 2:24 PM, Andreas Dilger <adilger@whamcloud.com> wrote: > > On 2011-12-08, at 3:59 PM, Christian Brunner wrote: > > 2011/11/15 Andreas Dilger <adilger@whamcloud.com>: > >> Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. > > > > Did you make progress with this. I'm still having serious trouble with > > btrfs and would like to try these. > > The latest patches are available at http://review.whamcloud.com/1708, but are based on the RHEL6.1 2.6.32 kernel. The work to implement "medium-sized" xattrs was more complex than anticipated, and is not finished yet. > > The use of external inode xattrs is working, which allows xattr sizes up to 64kB. The 64kB limit is imposed by the VFS and could potentially be increased. > I was able to get those compile on a recent kernel. One issue that I see is that it will only use a separate inode for the xattr if the xattr is big enough. However, it may be that we ran out of enough space to set a smaller xattr, and in that case we would fail setting it even though we'd be able to set a larger xattr. Another related issue, is that the number of xattrs that can be used is still limited (by what can be indexed in a single block?). I think that the first issue is substantial, as it exposes unexpected behavior (getting ENOSPC for small sized attrs, while bigger attrs succeed). For the second issue, it seems that when only using small xattrs it doesn't change anything from the old behavior. Yehuda -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2012-02-12 4:05 ` Yehuda Sadeh Weinraub @ 2012-02-12 17:18 ` Andreas Dilger 2012-02-12 19:50 ` Yehuda Sadeh Weinraub 0 siblings, 1 reply; 16+ messages in thread From: Andreas Dilger @ 2012-02-12 17:18 UTC (permalink / raw) To: Yehuda Sadeh Weinraub Cc: chb, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu On 2012-02-11, at 9:05 PM, Yehuda Sadeh Weinraub wrote: > On Fri, Dec 9, 2011 at 2:24 PM, Andreas Dilger <adilger@whamcloud.com> wrote: >> >> On 2011-12-08, at 3:59 PM, Christian Brunner wrote: >>> 2011/11/15 Andreas Dilger <adilger@whamcloud.com>: >>>> Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. >>> >>> Did you make progress with this. I'm still having serious trouble with >>> btrfs and would like to try these. >> >> The latest patches are available at http://review.whamcloud.com/1708, but are based on the RHEL6.1 2.6.32 kernel. The work to implement "medium-sized" xattrs was more complex than anticipated, and is not finished yet. >> >> The use of external inode xattrs is working, which allows xattr sizes up to 64kB. The 64kB limit is imposed by the VFS and could potentially be increased. > > I was able to get those compile on a recent kernel. It would be useful if you could send me the updated patch, if there were any significant changes. > One issue that I > see is that it will only use a separate inode for the xattr if the > xattr is big enough. However, it may be that we ran out of enough > space to set a smaller xattr, and in that case we would fail setting > it even though we'd be able to set a larger xattr. > Another related issue, is that the number of xattrs that can be used > is still limited (by what can be indexed in a single block?). That is true with the current patch. When the "medium sized" xattrs are finished, then there will be 64kB of space for xattrs in the external block(s), which could be both xattr headers and data. > I think that the first issue is substantial, as it exposes unexpected > behavior (getting ENOSPC for small sized attrs, while bigger attrs > succeed). For the second issue, it seems that when only using small > xattrs it doesn't change anything from the old behavior. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ceph and ext4 2012-02-12 17:18 ` Andreas Dilger @ 2012-02-12 19:50 ` Yehuda Sadeh Weinraub 0 siblings, 0 replies; 16+ messages in thread From: Yehuda Sadeh Weinraub @ 2012-02-12 19:50 UTC (permalink / raw) To: Andreas Dilger Cc: chb, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Jian Yu On Sun, Feb 12, 2012 at 9:18 AM, Andreas Dilger <adilger@whamcloud.com> wrote: > On 2012-02-11, at 9:05 PM, Yehuda Sadeh Weinraub wrote: >> On Fri, Dec 9, 2011 at 2:24 PM, Andreas Dilger <adilger@whamcloud.com> wrote: >>> >>> On 2011-12-08, at 3:59 PM, Christian Brunner wrote: >>>> 2011/11/15 Andreas Dilger <adilger@whamcloud.com>: >>>>> Coincidentally, we have someone working in those patches again. The main obstacle for accepting the previous patch as-is was that Ted wanted to add support for "medium-sized" xattrs that are addressed as a string of blocks, instead of via an inode. >>>> >>>> Did you make progress with this. I'm still having serious trouble with >>>> btrfs and would like to try these. >>> >>> The latest patches are available at http://review.whamcloud.com/1708, but are based on the RHEL6.1 2.6.32 kernel. The work to implement "medium-sized" xattrs was more complex than anticipated, and is not finished yet. >>> >>> The use of external inode xattrs is working, which allows xattr sizes up to 64kB. The 64kB limit is imposed by the VFS and could potentially be increased. >> >> I was able to get those compile on a recent kernel. > > It would be useful if you could send me the updated patch, if there were > any significant changes. I pushed it to the the ext4-large-xattr branch, here: https://github.com/NewDreamNetwork/ceph-client.git > >> One issue that I >> see is that it will only use a separate inode for the xattr if the >> xattr is big enough. However, it may be that we ran out of enough >> space to set a smaller xattr, and in that case we would fail setting >> it even though we'd be able to set a larger xattr. >> Another related issue, is that the number of xattrs that can be used >> is still limited (by what can be indexed in a single block?). > > That is true with the current patch. When the "medium sized" xattrs are > finished, then there will be 64kB of space for xattrs in the external > block(s), which could be both xattr headers and data. Is there any progress with this feature? Is there any public repository where we can take a look at that? Thanks, Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-02-12 19:50 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <4EC10664.1080501@tuxadero.com> [not found] ` <CAO47_-9nfO32nwOAQMVdqw6iWGWK+FWb+MD=WeOz_vr4YrrKmw@mail.gmail.com> 2011-11-15 14:22 ` ceph and ext4 Ted Ts'o 2011-11-15 16:43 ` Andreas Dilger 2011-11-15 18:43 ` Yehuda Sadeh Weinraub 2011-11-21 12:22 ` Increase xattr space by allocating contiguous xattr blocks Yu Jian 2011-11-21 15:08 ` Theodore Tso 2011-11-22 3:32 ` Yu Jian 2011-11-22 4:53 ` Eric Sandeen 2011-11-23 17:09 ` [PATCH] xfstests: Sort recursive getfattr output in 062 Eric Sandeen 2011-11-28 11:03 ` Christoph Hellwig 2012-01-25 21:38 ` [PATCH V2] " Eric Sandeen 2012-01-27 10:53 ` Christoph Hellwig 2011-12-08 22:59 ` ceph and ext4 Christian Brunner 2011-12-09 22:24 ` Andreas Dilger 2012-02-12 4:05 ` Yehuda Sadeh Weinraub 2012-02-12 17:18 ` Andreas Dilger 2012-02-12 19:50 ` Yehuda Sadeh Weinraub
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).