* Anyone using XFS in production on > 20TiB volumes? @ 2010-12-22 16:30 Justin Piszcz 2010-12-22 16:56 ` Emmanuel Florac 2010-12-22 17:06 ` Chris Wedgwood 0 siblings, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-22 16:30 UTC (permalink / raw) To: xfs Hi, Is there anyone currently using this in production? How much ram is needed when you fsck with a many files on such a volume? Dave Chinner reported 5.5g or so is needed for ~43TB with no inodes. Any recent issues/bugs one needs to be aware of? Is inode64 recommended on a 64-bit system? Any specific 64-bit tweaks/etc for a large 43TiB FS? Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz @ 2010-12-22 16:56 ` Emmanuel Florac 2010-12-22 19:03 ` Eric Sandeen 2010-12-22 17:06 ` Chris Wedgwood 1 sibling, 1 reply; 27+ messages in thread From: Emmanuel Florac @ 2010-12-22 16:56 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs Le Wed, 22 Dec 2010 11:30:05 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> écrivait: > Is there anyone currently using this in production? Yup, lots of people do. Currently supporting 28 such systems (from 20 to 76 TiB, most are 39.7 TiB). > How much ram is needed when you fsck with a many files on such a > volume? Dave Chinner reported 5.5g or so is needed for ~43TB with no > inodes. Any recent issues/bugs one needs to be aware of? I never had any trouble running xfs_repair on 39.7 TB+ systems with 8 GB of RAM. > Is inode64 recommended on a 64-bit system? Sure, however 32 bits clients may scoff sometimes, though it's limited to some weird programs. > Any specific 64-bit tweaks/etc for a large 43TiB FS? > Nothing unusual (inode64,noatime, mkfs with lazy-count enabled, etc). It should just works. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 16:56 ` Emmanuel Florac @ 2010-12-22 19:03 ` Eric Sandeen 2010-12-23 0:26 ` Emmanuel Florac 0 siblings, 1 reply; 27+ messages in thread From: Eric Sandeen @ 2010-12-22 19:03 UTC (permalink / raw) To: Emmanuel Florac; +Cc: Justin Piszcz, xfs On 12/22/10 10:56 AM, Emmanuel Florac wrote: > Le Wed, 22 Dec 2010 11:30:05 -0500 (EST) > Justin Piszcz <jpiszcz@lucidpixels.com> écrivait: > >> Is there anyone currently using this in production? > > Yup, lots of people do. Currently supporting 28 such systems (from 20 > to 76 TiB, most are 39.7 TiB). > >> How much ram is needed when you fsck with a many files on such a >> volume? Dave Chinner reported 5.5g or so is needed for ~43TB with no >> inodes. Any recent issues/bugs one needs to be aware of? > > I never had any trouble running xfs_repair on 39.7 TB+ systems with 8 GB > of RAM. > >> Is inode64 recommended on a 64-bit system? > > Sure, however 32 bits clients may scoff sometimes, though it's limited > to some weird programs. > >> Any specific 64-bit tweaks/etc for a large 43TiB FS? >> > > Nothing unusual (inode64,noatime, mkfs with lazy-count enabled, etc). It > should just works. yes, inode64 is recommended for such a large filesystem; lazy-count has been default in mkfs for quite some time. noatime if you really need it, I guess. See also http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E which mentions getting your geometry right if it's hardware raid that can't be detected automatically. (maybe we should add inode64 usecases to that too...) -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 19:03 ` Eric Sandeen @ 2010-12-23 0:26 ` Emmanuel Florac 2010-12-23 0:28 ` Justin Piszcz 0 siblings, 1 reply; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 0:26 UTC (permalink / raw) To: Eric Sandeen; +Cc: Justin Piszcz, xfs Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez: > http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E > > which mentions getting your geometry right if it's hardware raid > that can't be detected automatically. Just as a side note : I tried several times to manually set the filesystem layout to precisely match the underlying hardware RAID with sunit and swidth but didn't find that it made a noticeable difference. On my 39.9 TB systems, the default agcount is 39, while the optimum would be (theorically at least) 42. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 0:26 ` Emmanuel Florac @ 2010-12-23 0:28 ` Justin Piszcz 2010-12-23 0:56 ` Dave Chinner 2010-12-23 1:10 ` Emmanuel Florac 0 siblings, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 0:28 UTC (permalink / raw) To: Emmanuel Florac; +Cc: Eric Sandeen, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 1152 bytes --] On Thu, 23 Dec 2010, Emmanuel Florac wrote: > Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez: > >> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E >> >> which mentions getting your geometry right if it's hardware raid >> that can't be detected automatically. > > Just as a side note : I tried several times to manually set the > filesystem layout to precisely match the underlying hardware RAID > with sunit and swidth but didn't find that it made a noticeable > difference. On my 39.9 TB systems, the default agcount is 39, while the > optimum would be (theorically at least) 42. > > -- > ------------------------------------------------------------------------ > Emmanuel Florac | Direction technique > | Intellique > | <eflorac@intellique.com> > | +33 1 78 94 84 02 > ------------------------------------------------------------------------ > Hi, I concur, for hardware raid (at least on 3ware cards) I have found it makes no difference, thanks for confirming. Which RAID cards did you use? Justin. [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 0:28 ` Justin Piszcz @ 2010-12-23 0:56 ` Dave Chinner 2010-12-23 9:43 ` Justin Piszcz 2010-12-23 1:10 ` Emmanuel Florac 1 sibling, 1 reply; 27+ messages in thread From: Dave Chinner @ 2010-12-23 0:56 UTC (permalink / raw) To: Justin Piszcz; +Cc: Eric Sandeen, xfs On Wed, Dec 22, 2010 at 07:28:29PM -0500, Justin Piszcz wrote: > > > On Thu, 23 Dec 2010, Emmanuel Florac wrote: > > >Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez: > > > >>http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E > >> > >>which mentions getting your geometry right if it's hardware raid > >>that can't be detected automatically. > > > >Just as a side note : I tried several times to manually set the > >filesystem layout to precisely match the underlying hardware RAID > >with sunit and swidth but didn't find that it made a noticeable > >difference. On my 39.9 TB systems, the default agcount is 39, while the > >optimum would be (theorically at least) 42. > > Hi, I concur, for hardware raid (at least on 3ware cards) I have > found it makes no difference, thanks for confirming. I'd constrain that statement to "no difference for the workloads and hardware tested". Indeed, testing an empty filesystem will often show no difference in performance, because typically problems don't show up until you've started to age the filesystem significantly. When the filesystem has started to age, the difference between having done lots of stripe unit/width aligned allocation vs none can be very significant.... Hence don't assume that because you can't see any difference on a brand new, empty filesystem there never will be a difference over the life of the filesytem... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 0:56 ` Dave Chinner @ 2010-12-23 9:43 ` Justin Piszcz 2010-12-23 12:03 ` Emmanuel Florac 2010-12-23 18:06 ` Justin Piszcz 0 siblings, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 9:43 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 1928 bytes --] On Thu, 23 Dec 2010, Dave Chinner wrote: > On Wed, Dec 22, 2010 at 07:28:29PM -0500, Justin Piszcz wrote: >> >> >> On Thu, 23 Dec 2010, Emmanuel Florac wrote: >> >>> Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez: >>> >>>> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E >>>> >>>> which mentions getting your geometry right if it's hardware raid >>>> that can't be detected automatically. >>> >>> Just as a side note : I tried several times to manually set the >>> filesystem layout to precisely match the underlying hardware RAID >>> with sunit and swidth but didn't find that it made a noticeable >>> difference. On my 39.9 TB systems, the default agcount is 39, while the >>> optimum would be (theorically at least) 42. >> >> Hi, I concur, for hardware raid (at least on 3ware cards) I have >> found it makes no difference, thanks for confirming. > > I'd constrain that statement to "no difference for the workloads > and hardware tested". > > Indeed, testing an empty filesystem will often show no difference in > performance, because typically problems don't show up until you've > started to age the filesystem significantly. When the filesystem has > started to age, the difference between having done lots of stripe > unit/width aligned allocation vs none can be very significant.... > > Hence don't assume that because you can't see any difference on a > brand new, empty filesystem there never will be a difference over > the life of the filesytem... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > Hi Dave, So for an 18 disk raid-6 with 256k stripe you would recommend: mkfs.xfs with su=256k,sw=16 for optimal performance with inode64 mount option? Justin. [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 9:43 ` Justin Piszcz @ 2010-12-23 12:03 ` Emmanuel Florac 2010-12-23 18:06 ` Justin Piszcz 1 sibling, 0 replies; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 12:03 UTC (permalink / raw) To: Justin Piszcz; +Cc: Eric Sandeen, xfs Le Thu, 23 Dec 2010 04:43:26 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> écrivait: > So for an 18 disk raid-6 with 256k stripe you would recommend: > > mkfs.xfs with su=256k,sw=16 for optimal performance with inode64 > mount option? > Yes that should be the best setting. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 9:43 ` Justin Piszcz 2010-12-23 12:03 ` Emmanuel Florac @ 2010-12-23 18:06 ` Justin Piszcz 2010-12-23 18:55 ` Emmanuel Florac 2010-12-23 21:12 ` Eric Sandeen 1 sibling, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 18:06 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs On Thu, 23 Dec 2010, Justin Piszcz wrote: Hi, How come parted using (optimal at 1mb alignment is slower than no partition? In addition, sunit and swidth set properly as mentioned earlier appears to be _slower_ than defaults with no partitions. http://home.comcast.net/~jpiszcz/20101223/final.html Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 18:06 ` Justin Piszcz @ 2010-12-23 18:55 ` Emmanuel Florac 2010-12-23 19:07 ` Justin Piszcz 2010-12-23 19:29 ` Justin Piszcz 2010-12-23 21:12 ` Eric Sandeen 1 sibling, 2 replies; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 18:55 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez: > http://home.comcast.net/~jpiszcz/20101223/final.html > Something's wrong with the file create/stat/delete tests. Did you mount with "nobarrier"? Which drives, controller firmware, raid level, stripe width? BTW don't run only one test, it's meaningless. I always run at least 8 cycles (and up to 30 or 40 cycles) and then calculate the average and standard deviation, because one test among a cycle may vary wildly for some reason. You don't need the "char" tests, that doesn't correspond to any real-life usage pattern. Better run bonnie with the -f option, and -x with some large enough value. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 18:55 ` Emmanuel Florac @ 2010-12-23 19:07 ` Justin Piszcz 2010-12-23 19:54 ` Stan Hoeppner 2010-12-23 21:50 ` Emmanuel Florac 2010-12-23 19:29 ` Justin Piszcz 1 sibling, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 19:07 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 1895 bytes --] On Thu, 23 Dec 2010, Emmanuel Florac wrote: > Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez: > >> http://home.comcast.net/~jpiszcz/20101223/final.html >> > > Something's wrong with the file create/stat/delete tests. Did you mount > with "nobarrier"? No, default mount options.. Also I just changed it will update the page in a bit, the raid was on balance mode, with performance, the raid-rewrite went to ~420-430MiB/s, much faster. > Which drives, controller firmware, raid level, stripe width? Hitachi 7K3000 7200RPM 3TB Drives Latest firmware, 10.2 I think for the 9750-24ie Raid Level = 6 Stripe width = 256k (default) > > BTW don't run only one test, it's meaningless. I always run at least 8 > cycles (and up to 30 or 40 cycles) and then calculate the average and > standard deviation, because one test among a cycle may vary wildly for > some reason. You don't need the "char" tests, that doesn't correspond > to any real-life usage pattern. Better run bonnie with the -f option, > and -x with some large enough value. I ran 3 tests and took the average of the 3 runs per each unit test. I use this test because I have been using it for 3-4+ years so I can compare apples to apples. If its +++ or blank in the HTML that means it ran too fast for it to measure I believe. Main wonder I have is why when the partition is aligned to 1MiB, which is the default in parted 2.2+ I believe, is it slower than with no partitions? I will try again with mode=performance on the RAID controller.. > > -- > ------------------------------------------------------------------------ > Emmanuel Florac | Direction technique > | Intellique > | <eflorac@intellique.com> > | +33 1 78 94 84 02 > ------------------------------------------------------------------------ > [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 19:07 ` Justin Piszcz @ 2010-12-23 19:54 ` Stan Hoeppner 2010-12-23 21:48 ` Emmanuel Florac 2010-12-23 21:50 ` Emmanuel Florac 1 sibling, 1 reply; 27+ messages in thread From: Stan Hoeppner @ 2010-12-23 19:54 UTC (permalink / raw) To: xfs Justin Piszcz put forth on 12/23/2010 1:07 PM: > Main wonder I have is why when the partition is aligned to 1MiB, which is > the default in parted 2.2+ I believe, is it slower than with no partitions? Best guess? Those 3TB Hitachi drives use 512 byte translated native 4KB sectors. The 9750-24 ie card doesn't know how to properly align partitions on such drives, and/or you're using something other than fdisk or parted to create your partitions. Currently these are the only two partitioners that can align partitions properly on 512 byte translated/native 4KB sector drives. Thus you're taking a performance hit, same as with the WD "Advanced Format" drives which have 512 byte translated/native 4KB sectors. If you want maximum performance with least configuration headaches, avoid 512B/4KB sector hybrid drives. If you _need_ maximum drive capacity, live with the warts, or jump through hoops to get the partitions aligned, or, live without partitions if you can. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 19:54 ` Stan Hoeppner @ 2010-12-23 21:48 ` Emmanuel Florac 2010-12-23 23:21 ` Stan Hoeppner 0 siblings, 1 reply; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 21:48 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs Le Thu, 23 Dec 2010 13:54:14 -0600 vous écriviez: > Best guess? Those 3TB Hitachi drives use 512 byte translated native > 4KB sectors. Yes, I'm sure that no new drive model comes with true 512B sectors anymore... -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 21:48 ` Emmanuel Florac @ 2010-12-23 23:21 ` Stan Hoeppner 0 siblings, 0 replies; 27+ messages in thread From: Stan Hoeppner @ 2010-12-23 23:21 UTC (permalink / raw) To: xfs Emmanuel Florac put forth on 12/23/2010 3:48 PM: > Le Thu, 23 Dec 2010 13:54:14 -0600 vous écriviez: > >> Best guess? Those 3TB Hitachi drives use 512 byte translated native >> 4KB sectors. > > Yes, I'm sure that no new drive model comes with true 512B sectors > anymore... I believe most/all shipping drives of 1TB and smaller still have native 512 byte sectors, dependent on specific vendor/model line of course. It's mainly the 1.5TB and up drives with the hybrid 512/4096 byte sector abomination. It would be far more optimal if they'd just ship native 4K sector drives wouldn't it? Isn't most of Linux already patched for pure 4k sector drives? Is XFS ready for such native 4k sector drives? Are the various RAID cards/SAN array controllers? -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 19:07 ` Justin Piszcz 2010-12-23 19:54 ` Stan Hoeppner @ 2010-12-23 21:50 ` Emmanuel Florac 2010-12-23 22:04 ` Justin Piszcz 1 sibling, 1 reply; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 21:50 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs Le Thu, 23 Dec 2010 14:07:13 -0500 (EST) vous écriviez: > Main wonder I have is why when the partition is aligned to 1MiB, > which is the default in parted 2.2+ I believe, is it slower than with > no partitions? 1MiB possibly can't round well on the stripe boundaries. I suppose you could get better results with 64KB or 16KB stripes. Did you try with an LVM in between? -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 21:50 ` Emmanuel Florac @ 2010-12-23 22:04 ` Justin Piszcz 0 siblings, 0 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 22:04 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 2185 bytes --] Hi, I had not tested with 64KB or 16KB stripes: I used optimal (default I believe) in newer 2.2+ parted: -a alignment-type, --align alignment-type Set alignment for newly created partitions, valid alignment types are: none Use the minimum alignment allowed by the disk type. cylinder Align partitions to cylinders. minimal Use minimum alignment as given by the disk topology information. This and the opt value will use layout information provided by the disk to align the logical partition table addresses to actual physical blocks on the disks. The min value is the minimum aligment needed to align the partition properly to physical blocks, which avoids performance degradation. optimal Use optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance. I'm happy with the performance now.. I get 16GB ram tomorrow so hopefully that'll be enough if I need to xfs_repair. Justin. On Thu, 23 Dec 2010, Emmanuel Florac wrote: > Le Thu, 23 Dec 2010 14:07:13 -0500 (EST) vous écriviez: > >> Main wonder I have is why when the partition is aligned to 1MiB, >> which is the default in parted 2.2+ I believe, is it slower than with >> no partitions? > > 1MiB possibly can't round well on the stripe boundaries. I suppose you > could get better results with 64KB or 16KB stripes. Did you try with an > LVM in between? > > -- > ------------------------------------------------------------------------ > Emmanuel Florac | Direction technique > | Intellique > | <eflorac@intellique.com> > | +33 1 78 94 84 02 > ------------------------------------------------------------------------ > [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 18:55 ` Emmanuel Florac 2010-12-23 19:07 ` Justin Piszcz @ 2010-12-23 19:29 ` Justin Piszcz 2010-12-23 19:58 ` Stan Hoeppner 2010-12-24 1:01 ` Stan Hoeppner 1 sibling, 2 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-23 19:29 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 547 bytes --] On Thu, 23 Dec 2010, Emmanuel Florac wrote: > Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez: > >> http://home.comcast.net/~jpiszcz/20101223/final.html >> Please check the updated page: http://home.comcast.net/~jpiszcz/20101223/final.html Using a partition shows a slight degredation in the re-write speed but an increase in performance for sequential output and input with the mode set to perform. Looks like this is what I will be using as it is the fastest speeds overall except for the rewrite. Thanks! Justin. [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 19:29 ` Justin Piszcz @ 2010-12-23 19:58 ` Stan Hoeppner 2010-12-24 1:01 ` Stan Hoeppner 1 sibling, 0 replies; 27+ messages in thread From: Stan Hoeppner @ 2010-12-23 19:58 UTC (permalink / raw) To: xfs Justin Piszcz put forth on 12/23/2010 1:29 PM: > Using a partition shows a slight degredation in the re-write speed but > an increase in performance for sequential output and input with the mode > set to perform. Looks like this is what I will be using as it is the > fastest > speeds overall except for the rewrite. As Dave mentioned earlier, performance may degrade significantly over time as the FS grows and ages, compared to running benchies against an empty filesystem today, especially if your mkfs.xfs parms were off the mark when creating. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 19:29 ` Justin Piszcz 2010-12-23 19:58 ` Stan Hoeppner @ 2010-12-24 1:01 ` Stan Hoeppner 1 sibling, 0 replies; 27+ messages in thread From: Stan Hoeppner @ 2010-12-24 1:01 UTC (permalink / raw) To: xfs Justin Piszcz put forth on 12/23/2010 1:29 PM: > Please check the updated page: > http://home.comcast.net/~jpiszcz/20101223/final.html > > Using a partition shows a slight degredation in the re-write speed but > an increase in performance for sequential output and input with the mode > set to perform. Looks like this is what I will be using as it is the > fastest > speeds overall except for the rewrite. If your primary workloads for this array are mostly single user/thread streaming writes/reads then this may be fine. If they are multi-user or multi-threaded random re-write server loads, re-write is the most important metric and you should optimize for that scenario alone, as its performance is most dramatically impacted by parity RAID schemes such as RAID 6. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 18:06 ` Justin Piszcz 2010-12-23 18:55 ` Emmanuel Florac @ 2010-12-23 21:12 ` Eric Sandeen 1 sibling, 0 replies; 27+ messages in thread From: Eric Sandeen @ 2010-12-23 21:12 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs On 12/23/10 12:06 PM, Justin Piszcz wrote: > > On Thu, 23 Dec 2010, Justin Piszcz wrote: > > > Hi, > > How come parted using (optimal at 1mb alignment is slower than no > partition? because parted got it wrong, sounds like. > In addition, sunit and swidth set properly as mentioned > earlier appears to be _slower_ than defaults with no partitions. stripe unit over an incorrectly aligned partition won't help and I suppose could make it worse. align your partitions, using sector units, to a stripe width unit. Set the stripe width properly on the fs on top of that. -Eric > > http://home.comcast.net/~jpiszcz/20101223/final.html > > Justin. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-23 0:28 ` Justin Piszcz 2010-12-23 0:56 ` Dave Chinner @ 2010-12-23 1:10 ` Emmanuel Florac 1 sibling, 0 replies; 27+ messages in thread From: Emmanuel Florac @ 2010-12-23 1:10 UTC (permalink / raw) To: Justin Piszcz; +Cc: Eric Sandeen, xfs Le Wed, 22 Dec 2010 19:28:29 -0500 (EST) vous écriviez: > Hi, I concur, for hardware raid (at least on 3ware cards) I have > found it makes no difference, thanks for confirming. > > Which RAID cards did you use? Mostly 3Ware until recently, but I switched to Adaptec. However I'm still running tests to sort out any peculiarity - found quite a lot of "features" since 2008 :) -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz 2010-12-22 16:56 ` Emmanuel Florac @ 2010-12-22 17:06 ` Chris Wedgwood 2010-12-22 17:10 ` Justin Piszcz 1 sibling, 1 reply; 27+ messages in thread From: Chris Wedgwood @ 2010-12-22 17:06 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs On Wed, Dec 22, 2010 at 11:30:05AM -0500, Justin Piszcz wrote: > Is there anyone currently using this in production? yes (in the past more than now) > How much ram is needed when you fsck with a many files on such a > volume? didn't check specifically, but with older xfsprogs it could easily use more than 16GB > Is inode64 recommended on a 64-bit system? it's for inode distribution, not 64-bit vs 32-bit system yes, enable that ... it should be the default > Any specific 64-bit tweaks/etc for a large 43TiB FS? if using hw raid teach mkfs about the array geom, that coupled with inode64 made a huge difference in performance that last time i did this _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 17:06 ` Chris Wedgwood @ 2010-12-22 17:10 ` Justin Piszcz 2010-12-22 17:32 ` Chris Wedgwood 0 siblings, 1 reply; 27+ messages in thread From: Justin Piszcz @ 2010-12-22 17:10 UTC (permalink / raw) To: Chris Wedgwood; +Cc: xfs On Wed, 22 Dec 2010, Chris Wedgwood wrote: > if using hw raid teach mkfs about the array geom, that coupled with > inode64 made a huge difference in performance that last time i did > this When I had used XFS in the past on a 3ware 9650SE-16ML with 1TB HDDs I did not notice any performance difference on whether you set the su/swidth/etc. Do you have an example/of what you found? Is it dependent on the RAID card? Justin. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 17:10 ` Justin Piszcz @ 2010-12-22 17:32 ` Chris Wedgwood 2010-12-22 17:35 ` Justin Piszcz 0 siblings, 1 reply; 27+ messages in thread From: Chris Wedgwood @ 2010-12-22 17:32 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs On Wed, Dec 22, 2010 at 12:10:06PM -0500, Justin Piszcz wrote: > Do you have an example/of what you found? i don't have the numbers anymore, they are with a previous employer. basically using dbench (there were cifs NAS machines, so dbench seemed as good or bad as anything to test with) the performance was about 3x better between 'old' and 'new' with a small number of workers and about 10x better with a large number i don't know how much difference each of inode64 and getting the geom right made each, but bother were quite measurable in the graphs i made at the time from memory the machines are raid50 (4x (5+1)) with 2TB drives, so about 38TB usable on each one initially these machines were 3ware controllers and later on LSI (the two products lines have since merged so it's not clear how much difference that makes now) in testing 16GB for xfs_repair wasn't enough, so they were upped to 64GB, that's likely largely a result of the fact there were 100s of millions of small files (as well as some large ones) > Is it dependent on the RAID card? perhaps, do you have a BBU and enable WC? certainly we found the LSI cards to be faster in most cases than the (now old) 3ware where i am now i use larger chassis and no hw raid cards, using sw raid on these works spectacularly well with the exception of burst of small seeky writes (which a BBU + wc soaks up quite well) _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 17:32 ` Chris Wedgwood @ 2010-12-22 17:35 ` Justin Piszcz 2010-12-22 18:50 ` Chris Wedgwood 0 siblings, 1 reply; 27+ messages in thread From: Justin Piszcz @ 2010-12-22 17:35 UTC (permalink / raw) To: Chris Wedgwood; +Cc: xfs On Wed, 22 Dec 2010, Chris Wedgwood wrote: > On Wed, Dec 22, 2010 at 12:10:06PM -0500, Justin Piszcz wrote: > >> Do you have an example/of what you found? > > i don't have the numbers anymore, they are with a previous employer. > > basically using dbench (there were cifs NAS machines, so dbench seemed > as good or bad as anything to test with) the performance was about 3x > better between 'old' and 'new' with a small number of workers and > about 10x better with a large number Is this by specifying the sunit/swidth? Can you elaborate on which paramters you modified? > > i don't know how much difference each of inode64 and getting the geom > right made each, but bother were quite measurable in the graphs i made > at the time > > > from memory the machines are raid50 (4x (5+1)) with 2TB drives, so > about 38TB usable on each one > > initially these machines were 3ware controllers and later on LSI (the > two products lines have since merged so it's not clear how much > difference that makes now) > > in testing 16GB for xfs_repair wasn't enough, so they were upped to > 64GB, that's likely largely a result of the fact there were 100s of > millions of small files (as well as some large ones) Yikes =) Hopefully its better now? > >> Is it dependent on the RAID card? > > perhaps, do you have a BBU and enable WC? certainly we found the LSI > cards to be faster in most cases than the (now old) 3ware Yes and have it set to perform(ance). Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1 hot spare). > > where i am now i use larger chassis and no hw raid cards, using sw > raid on these works spectacularly well with the exception of burst of > small seeky writes (which a BBU + wc soaks up quite well) Interesting.. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 17:35 ` Justin Piszcz @ 2010-12-22 18:50 ` Chris Wedgwood 2010-12-22 19:24 ` Justin Piszcz 0 siblings, 1 reply; 27+ messages in thread From: Chris Wedgwood @ 2010-12-22 18:50 UTC (permalink / raw) To: Justin Piszcz; +Cc: xfs On Wed, Dec 22, 2010 at 12:35:46PM -0500, Justin Piszcz wrote: > Is this by specifying the sunit/swidth? yes > Can you elaborate on which paramters you modified? i set both to match what the hw raid was doing, the lru that is my brain doesn't have the detail anymore sorry at a guess it was probably something like 64k chunk, 20 devices wide the metadata performance difference between wrong and right is quite noticable > Yes and have it set to perform(ance). be sure you have a bbu, there is a setting for force wc --- i wouldn't do that i would have it wc when the battery is good automatically disable it when it's bad i was able to buffer ~490MB of writes in the card without trying hard, that's a lot of pain to get corrupted > Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1 > hot spare). are those 4k sector drives? i ask because it's not clear if/how the raid fw deals with these _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Anyone using XFS in production on > 20TiB volumes? 2010-12-22 18:50 ` Chris Wedgwood @ 2010-12-22 19:24 ` Justin Piszcz 0 siblings, 0 replies; 27+ messages in thread From: Justin Piszcz @ 2010-12-22 19:24 UTC (permalink / raw) To: Chris Wedgwood; +Cc: xfs On Wed, 22 Dec 2010, Chris Wedgwood wrote: > On Wed, Dec 22, 2010 at 12:35:46PM -0500, Justin Piszcz wrote: > >> Is this by specifying the sunit/swidth? > > yes > >> Can you elaborate on which paramters you modified? > > i set both to match what the hw raid was doing, the lru that is my > brain doesn't have the detail anymore sorry > > at a guess it was probably something like 64k chunk, 20 devices wide > > > the metadata performance difference between wrong and right is quite > noticable > > >> Yes and have it set to perform(ance). > > be sure you have a bbu, there is a setting for force wc --- i wouldn't > do that > > i would have it wc when the battery is good automatically disable it > when it's bad > > i was able to buffer ~490MB of writes in the card without trying hard, > that's a lot of pain to get corrupted > >> Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1 >> hot spare). > > are those 4k sector drives? i ask because it's not clear if/how the > raid fw deals with these > Hi, they are 512 byte sector drives. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-12-24 0:59 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz 2010-12-22 16:56 ` Emmanuel Florac 2010-12-22 19:03 ` Eric Sandeen 2010-12-23 0:26 ` Emmanuel Florac 2010-12-23 0:28 ` Justin Piszcz 2010-12-23 0:56 ` Dave Chinner 2010-12-23 9:43 ` Justin Piszcz 2010-12-23 12:03 ` Emmanuel Florac 2010-12-23 18:06 ` Justin Piszcz 2010-12-23 18:55 ` Emmanuel Florac 2010-12-23 19:07 ` Justin Piszcz 2010-12-23 19:54 ` Stan Hoeppner 2010-12-23 21:48 ` Emmanuel Florac 2010-12-23 23:21 ` Stan Hoeppner 2010-12-23 21:50 ` Emmanuel Florac 2010-12-23 22:04 ` Justin Piszcz 2010-12-23 19:29 ` Justin Piszcz 2010-12-23 19:58 ` Stan Hoeppner 2010-12-24 1:01 ` Stan Hoeppner 2010-12-23 21:12 ` Eric Sandeen 2010-12-23 1:10 ` Emmanuel Florac 2010-12-22 17:06 ` Chris Wedgwood 2010-12-22 17:10 ` Justin Piszcz 2010-12-22 17:32 ` Chris Wedgwood 2010-12-22 17:35 ` Justin Piszcz 2010-12-22 18:50 ` Chris Wedgwood 2010-12-22 19:24 ` Justin Piszcz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox