public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Anyone using XFS in production on > 20TiB volumes?
@ 2010-12-22 16:30 Justin Piszcz
  2010-12-22 16:56 ` Emmanuel Florac
  2010-12-22 17:06 ` Chris Wedgwood
  0 siblings, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-22 16:30 UTC (permalink / raw)
  To: xfs

Hi,

Is there anyone currently using this in production?
How much ram is needed when you fsck with a many files on such a volume?
Dave Chinner reported 5.5g or so is needed for ~43TB with no inodes.
Any recent issues/bugs one needs to be aware of?
Is inode64 recommended on a 64-bit system?
Any specific 64-bit tweaks/etc for a large 43TiB FS?

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz
@ 2010-12-22 16:56 ` Emmanuel Florac
  2010-12-22 19:03   ` Eric Sandeen
  2010-12-22 17:06 ` Chris Wedgwood
  1 sibling, 1 reply; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-22 16:56 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

Le Wed, 22 Dec 2010 11:30:05 -0500 (EST)
Justin Piszcz <jpiszcz@lucidpixels.com> écrivait:

> Is there anyone currently using this in production?

Yup, lots of people do. Currently supporting 28 such systems (from 20
to 76 TiB, most are 39.7 TiB).

> How much ram is needed when you fsck with a many files on such a
> volume? Dave Chinner reported 5.5g or so is needed for ~43TB with no
> inodes. Any recent issues/bugs one needs to be aware of?

I never had any trouble running xfs_repair on 39.7 TB+ systems with 8 GB
of RAM.

> Is inode64 recommended on a 64-bit system?

Sure, however 32 bits clients may scoff sometimes, though it's limited
to some weird programs.

> Any specific 64-bit tweaks/etc for a large 43TiB FS?
> 

Nothing unusual (inode64,noatime, mkfs with lazy-count enabled, etc). It
should just works.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz
  2010-12-22 16:56 ` Emmanuel Florac
@ 2010-12-22 17:06 ` Chris Wedgwood
  2010-12-22 17:10   ` Justin Piszcz
  1 sibling, 1 reply; 27+ messages in thread
From: Chris Wedgwood @ 2010-12-22 17:06 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

On Wed, Dec 22, 2010 at 11:30:05AM -0500, Justin Piszcz wrote:

> Is there anyone currently using this in production?

yes (in the past more than now)

> How much ram is needed when you fsck with a many files on such a
> volume?

didn't check specifically, but with older xfsprogs it could easily use
more than 16GB

> Is inode64 recommended on a 64-bit system?

it's for inode distribution, not 64-bit vs 32-bit system

yes, enable that ... it should be the default

> Any specific 64-bit tweaks/etc for a large 43TiB FS?

if using hw raid teach mkfs about the array geom, that coupled with
inode64 made a huge difference in performance that last time i did
this

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 17:06 ` Chris Wedgwood
@ 2010-12-22 17:10   ` Justin Piszcz
  2010-12-22 17:32     ` Chris Wedgwood
  0 siblings, 1 reply; 27+ messages in thread
From: Justin Piszcz @ 2010-12-22 17:10 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: xfs



On Wed, 22 Dec 2010, Chris Wedgwood wrote:

> if using hw raid teach mkfs about the array geom, that coupled with
> inode64 made a huge difference in performance that last time i did
> this

When I had used XFS in the past on a 3ware 9650SE-16ML with 1TB HDDs I did
not notice any performance difference on whether you set the su/swidth/etc.

Do you have an example/of what you found?
Is it dependent on the RAID card?

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 17:10   ` Justin Piszcz
@ 2010-12-22 17:32     ` Chris Wedgwood
  2010-12-22 17:35       ` Justin Piszcz
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Wedgwood @ 2010-12-22 17:32 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

On Wed, Dec 22, 2010 at 12:10:06PM -0500, Justin Piszcz wrote:

> Do you have an example/of what you found?

i don't have the numbers anymore, they are with a previous employer.

basically using dbench (there were cifs NAS machines, so dbench seemed
as good or bad as anything to test with) the performance was about 3x
better between 'old' and 'new' with a small number of workers and
about 10x better with a large number

i don't know how much difference each of inode64 and getting the geom
right made each, but bother were quite measurable in the graphs i made
at the time


from memory the machines are raid50 (4x (5+1)) with 2TB drives, so
about 38TB usable on each one

initially these machines were 3ware controllers and later on LSI (the
two products lines have since merged so it's not clear how much
difference that makes now)

in testing 16GB for xfs_repair wasn't enough, so they were upped to
64GB, that's likely largely a result of the fact there were 100s of
millions of small files (as well as some large ones)

> Is it dependent on the RAID card?

perhaps, do you have a BBU and enable WC?  certainly we found the LSI
cards to be faster in most cases than the (now old) 3ware


where i am now i use larger chassis and no hw raid cards, using sw
raid on these works spectacularly well with the exception of burst of
small seeky writes (which a BBU + wc soaks up quite well)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 17:32     ` Chris Wedgwood
@ 2010-12-22 17:35       ` Justin Piszcz
  2010-12-22 18:50         ` Chris Wedgwood
  0 siblings, 1 reply; 27+ messages in thread
From: Justin Piszcz @ 2010-12-22 17:35 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: xfs



On Wed, 22 Dec 2010, Chris Wedgwood wrote:

> On Wed, Dec 22, 2010 at 12:10:06PM -0500, Justin Piszcz wrote:
>
>> Do you have an example/of what you found?
>
> i don't have the numbers anymore, they are with a previous employer.
>
> basically using dbench (there were cifs NAS machines, so dbench seemed
> as good or bad as anything to test with) the performance was about 3x
> better between 'old' and 'new' with a small number of workers and
> about 10x better with a large number
Is this by specifying the sunit/swidth?
Can you elaborate on which paramters you modified?

>
> i don't know how much difference each of inode64 and getting the geom
> right made each, but bother were quite measurable in the graphs i made
> at the time
>
>
> from memory the machines are raid50 (4x (5+1)) with 2TB drives, so
> about 38TB usable on each one
>
> initially these machines were 3ware controllers and later on LSI (the
> two products lines have since merged so it's not clear how much
> difference that makes now)
>
> in testing 16GB for xfs_repair wasn't enough, so they were upped to
> 64GB, that's likely largely a result of the fact there were 100s of
> millions of small files (as well as some large ones)
Yikes =)  Hopefully its better now?

>
>> Is it dependent on the RAID card?
>
> perhaps, do you have a BBU and enable WC?  certainly we found the LSI
> cards to be faster in most cases than the (now old) 3ware

Yes and have it set to perform(ance).
Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1 hot spare).

>
> where i am now i use larger chassis and no hw raid cards, using sw
> raid on these works spectacularly well with the exception of burst of
> small seeky writes (which a BBU + wc soaks up quite well)
Interesting..

>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 17:35       ` Justin Piszcz
@ 2010-12-22 18:50         ` Chris Wedgwood
  2010-12-22 19:24           ` Justin Piszcz
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Wedgwood @ 2010-12-22 18:50 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

On Wed, Dec 22, 2010 at 12:35:46PM -0500, Justin Piszcz wrote:

> Is this by specifying the sunit/swidth?

yes

> Can you elaborate on which paramters you modified?

i set both to match what the hw raid was doing, the lru that is my
brain doesn't have the detail anymore sorry

at a guess it was probably something like 64k chunk, 20 devices wide


the metadata performance difference between wrong and right is quite
noticable


> Yes and have it set to perform(ance).

be sure you have a bbu, there is a setting for force wc --- i wouldn't
do that

i would have it wc when the battery is good automatically disable it
when it's bad

i was able to buffer ~490MB of writes in the card without trying hard,
that's a lot of pain to get corrupted

> Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1
> hot spare).

are those 4k sector drives?  i ask because it's not clear if/how the
raid fw deals with these

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 16:56 ` Emmanuel Florac
@ 2010-12-22 19:03   ` Eric Sandeen
  2010-12-23  0:26     ` Emmanuel Florac
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sandeen @ 2010-12-22 19:03 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Justin Piszcz, xfs

On 12/22/10 10:56 AM, Emmanuel Florac wrote:
> Le Wed, 22 Dec 2010 11:30:05 -0500 (EST)
> Justin Piszcz <jpiszcz@lucidpixels.com> écrivait:
> 
>> Is there anyone currently using this in production?
> 
> Yup, lots of people do. Currently supporting 28 such systems (from 20
> to 76 TiB, most are 39.7 TiB).
> 
>> How much ram is needed when you fsck with a many files on such a
>> volume? Dave Chinner reported 5.5g or so is needed for ~43TB with no
>> inodes. Any recent issues/bugs one needs to be aware of?
> 
> I never had any trouble running xfs_repair on 39.7 TB+ systems with 8 GB
> of RAM.
> 
>> Is inode64 recommended on a 64-bit system?
> 
> Sure, however 32 bits clients may scoff sometimes, though it's limited
> to some weird programs.
> 
>> Any specific 64-bit tweaks/etc for a large 43TiB FS?
>>
> 
> Nothing unusual (inode64,noatime, mkfs with lazy-count enabled, etc). It
> should just works.

yes, inode64 is recommended for such a large filesystem; lazy-count
has been default in mkfs for quite some time.  noatime if you really
need it, I guess.

See also

http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E

which mentions getting your geometry right if it's hardware raid
that can't be detected automatically.

(maybe we should add inode64 usecases to that too...)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 18:50         ` Chris Wedgwood
@ 2010-12-22 19:24           ` Justin Piszcz
  0 siblings, 0 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-22 19:24 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: xfs



On Wed, 22 Dec 2010, Chris Wedgwood wrote:

> On Wed, Dec 22, 2010 at 12:35:46PM -0500, Justin Piszcz wrote:
>
>> Is this by specifying the sunit/swidth?
>
> yes
>
>> Can you elaborate on which paramters you modified?
>
> i set both to match what the hw raid was doing, the lru that is my
> brain doesn't have the detail anymore sorry
>
> at a guess it was probably something like 64k chunk, 20 devices wide
>
>
> the metadata performance difference between wrong and right is quite
> noticable
>
>
>> Yes and have it set to perform(ance).
>
> be sure you have a bbu, there is a setting for force wc --- i wouldn't
> do that
>
> i would have it wc when the battery is good automatically disable it
> when it's bad
>
> i was able to buffer ~490MB of writes in the card without trying hard,
> that's a lot of pain to get corrupted
>
>> Going to be using 19HDD x 3TB Hiatchi 7200RPMs, (18HDD RAID-6 + 1
>> hot spare).
>
> are those 4k sector drives?  i ask because it's not clear if/how the
> raid fw deals with these
>

Hi, they are 512 byte sector drives.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-22 19:03   ` Eric Sandeen
@ 2010-12-23  0:26     ` Emmanuel Florac
  2010-12-23  0:28       ` Justin Piszcz
  0 siblings, 1 reply; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23  0:26 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Justin Piszcz, xfs

Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez:

> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
> 
> which mentions getting your geometry right if it's hardware raid
> that can't be detected automatically.

Just as a side note : I tried several times to manually set the
filesystem layout to precisely match the underlying hardware RAID
with sunit and swidth but didn't find that it made a noticeable
difference. On my 39.9 TB systems, the default agcount is 39, while the
optimum would be (theorically at least)  42. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  0:26     ` Emmanuel Florac
@ 2010-12-23  0:28       ` Justin Piszcz
  2010-12-23  0:56         ` Dave Chinner
  2010-12-23  1:10         ` Emmanuel Florac
  0 siblings, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23  0:28 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Eric Sandeen, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1152 bytes --]



On Thu, 23 Dec 2010, Emmanuel Florac wrote:

> Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez:
>
>> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
>>
>> which mentions getting your geometry right if it's hardware raid
>> that can't be detected automatically.
>
> Just as a side note : I tried several times to manually set the
> filesystem layout to precisely match the underlying hardware RAID
> with sunit and swidth but didn't find that it made a noticeable
> difference. On my 39.9 TB systems, the default agcount is 39, while the
> optimum would be (theorically at least)  42.
>
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
>

Hi, I concur, for hardware raid (at least on 3ware cards) I have found it 
makes no difference, thanks for confirming.

Which RAID cards did you use?

Justin.

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  0:28       ` Justin Piszcz
@ 2010-12-23  0:56         ` Dave Chinner
  2010-12-23  9:43           ` Justin Piszcz
  2010-12-23  1:10         ` Emmanuel Florac
  1 sibling, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2010-12-23  0:56 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Eric Sandeen, xfs

On Wed, Dec 22, 2010 at 07:28:29PM -0500, Justin Piszcz wrote:
> 
> 
> On Thu, 23 Dec 2010, Emmanuel Florac wrote:
> 
> >Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez:
> >
> >>http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
> >>
> >>which mentions getting your geometry right if it's hardware raid
> >>that can't be detected automatically.
> >
> >Just as a side note : I tried several times to manually set the
> >filesystem layout to precisely match the underlying hardware RAID
> >with sunit and swidth but didn't find that it made a noticeable
> >difference. On my 39.9 TB systems, the default agcount is 39, while the
> >optimum would be (theorically at least)  42.
> 
> Hi, I concur, for hardware raid (at least on 3ware cards) I have
> found it makes no difference, thanks for confirming.

I'd constrain that statement to "no difference for the workloads
and hardware tested".

Indeed, testing an empty filesystem will often show no difference in
performance, because typically problems don't show up until you've
started to age the filesystem significantly. When the filesystem has
started to age, the difference between having done lots of stripe
unit/width aligned allocation vs none can be very significant....

Hence don't assume that because you can't see any difference on a
brand new, empty filesystem there never will be a difference over
the life of the filesytem...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  0:28       ` Justin Piszcz
  2010-12-23  0:56         ` Dave Chinner
@ 2010-12-23  1:10         ` Emmanuel Florac
  1 sibling, 0 replies; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23  1:10 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Eric Sandeen, xfs

Le Wed, 22 Dec 2010 19:28:29 -0500 (EST) vous écriviez:

> Hi, I concur, for hardware raid (at least on 3ware cards) I have
> found it makes no difference, thanks for confirming.
> 
> Which RAID cards did you use?

Mostly 3Ware until recently, but I switched to Adaptec. However I'm
still running tests to sort out any peculiarity - found quite a lot of
"features" since 2008 :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  0:56         ` Dave Chinner
@ 2010-12-23  9:43           ` Justin Piszcz
  2010-12-23 12:03             ` Emmanuel Florac
  2010-12-23 18:06             ` Justin Piszcz
  0 siblings, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23  9:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1928 bytes --]



On Thu, 23 Dec 2010, Dave Chinner wrote:

> On Wed, Dec 22, 2010 at 07:28:29PM -0500, Justin Piszcz wrote:
>>
>>
>> On Thu, 23 Dec 2010, Emmanuel Florac wrote:
>>
>>> Le Wed, 22 Dec 2010 13:03:13 -0600 vous écriviez:
>>>
>>>> http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
>>>>
>>>> which mentions getting your geometry right if it's hardware raid
>>>> that can't be detected automatically.
>>>
>>> Just as a side note : I tried several times to manually set the
>>> filesystem layout to precisely match the underlying hardware RAID
>>> with sunit and swidth but didn't find that it made a noticeable
>>> difference. On my 39.9 TB systems, the default agcount is 39, while the
>>> optimum would be (theorically at least)  42.
>>
>> Hi, I concur, for hardware raid (at least on 3ware cards) I have
>> found it makes no difference, thanks for confirming.
>
> I'd constrain that statement to "no difference for the workloads
> and hardware tested".
>
> Indeed, testing an empty filesystem will often show no difference in
> performance, because typically problems don't show up until you've
> started to age the filesystem significantly. When the filesystem has
> started to age, the difference between having done lots of stripe
> unit/width aligned allocation vs none can be very significant....
>
> Hence don't assume that because you can't see any difference on a
> brand new, empty filesystem there never will be a difference over
> the life of the filesytem...
>
> Cheers,
>
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

Hi Dave,

So for an 18 disk raid-6 with 256k stripe you would recommend:

mkfs.xfs with su=256k,sw=16 for optimal performance with inode64 mount 
option?

Justin.

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  9:43           ` Justin Piszcz
@ 2010-12-23 12:03             ` Emmanuel Florac
  2010-12-23 18:06             ` Justin Piszcz
  1 sibling, 0 replies; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23 12:03 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Eric Sandeen, xfs

Le Thu, 23 Dec 2010 04:43:26 -0500 (EST)
Justin Piszcz <jpiszcz@lucidpixels.com> écrivait:

> So for an 18 disk raid-6 with 256k stripe you would recommend:
> 
> mkfs.xfs with su=256k,sw=16 for optimal performance with inode64
> mount option?
> 

Yes that should be the best setting. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23  9:43           ` Justin Piszcz
  2010-12-23 12:03             ` Emmanuel Florac
@ 2010-12-23 18:06             ` Justin Piszcz
  2010-12-23 18:55               ` Emmanuel Florac
  2010-12-23 21:12               ` Eric Sandeen
  1 sibling, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23 18:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Eric Sandeen, xfs


On Thu, 23 Dec 2010, Justin Piszcz wrote:


Hi,

How come parted using (optimal at 1mb alignment is slower than no 
partition?  In addition, sunit and swidth set properly as mentioned earlier
appears to be _slower_ than defaults with no partitions.


http://home.comcast.net/~jpiszcz/20101223/final.html

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 18:06             ` Justin Piszcz
@ 2010-12-23 18:55               ` Emmanuel Florac
  2010-12-23 19:07                 ` Justin Piszcz
  2010-12-23 19:29                 ` Justin Piszcz
  2010-12-23 21:12               ` Eric Sandeen
  1 sibling, 2 replies; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23 18:55 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez:

> http://home.comcast.net/~jpiszcz/20101223/final.html
> 

Something's wrong with the file create/stat/delete tests. Did you mount
with "nobarrier"? 
Which drives, controller firmware, raid level, stripe width? 

BTW don't run only one test, it's meaningless. I always run at least 8
cycles (and up to 30 or 40 cycles) and then calculate the average and
standard deviation, because one test among a cycle may vary wildly for
some reason. You don't need the "char" tests, that doesn't correspond
to any real-life usage pattern. Better run bonnie with the -f option,
and -x with some large enough value.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 18:55               ` Emmanuel Florac
@ 2010-12-23 19:07                 ` Justin Piszcz
  2010-12-23 19:54                   ` Stan Hoeppner
  2010-12-23 21:50                   ` Emmanuel Florac
  2010-12-23 19:29                 ` Justin Piszcz
  1 sibling, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23 19:07 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1895 bytes --]



On Thu, 23 Dec 2010, Emmanuel Florac wrote:

> Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez:
>
>> http://home.comcast.net/~jpiszcz/20101223/final.html
>>
>
> Something's wrong with the file create/stat/delete tests. Did you mount
> with "nobarrier"?
No, default mount options..
Also I just changed it will update the page in a bit, the raid was on
balance mode, with performance, the raid-rewrite went to ~420-430MiB/s, much
faster.

> Which drives, controller firmware, raid level, stripe width?
Hitachi 7K3000 7200RPM 3TB Drives
Latest firmware, 10.2 I think for the 9750-24ie
Raid Level = 6
Stripe width = 256k (default)
>
> BTW don't run only one test, it's meaningless. I always run at least 8
> cycles (and up to 30 or 40 cycles) and then calculate the average and
> standard deviation, because one test among a cycle may vary wildly for
> some reason. You don't need the "char" tests, that doesn't correspond
> to any real-life usage pattern. Better run bonnie with the -f option,
> and -x with some large enough value.

I ran 3 tests and took the average of the 3 runs per each unit test. 
I use this test because I have been using it for 3-4+ years so I can compare 
apples to apples.

If its +++ or blank in the HTML that means it ran too fast for it to measure
I believe.

Main wonder I have is why when the partition is aligned to 1MiB, which is
the default in parted 2.2+ I believe, is it slower than with no partitions?

I will try again with mode=performance on the RAID controller..


>
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
>

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 18:55               ` Emmanuel Florac
  2010-12-23 19:07                 ` Justin Piszcz
@ 2010-12-23 19:29                 ` Justin Piszcz
  2010-12-23 19:58                   ` Stan Hoeppner
  2010-12-24  1:01                   ` Stan Hoeppner
  1 sibling, 2 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23 19:29 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 547 bytes --]

On Thu, 23 Dec 2010, Emmanuel Florac wrote:

> Le Thu, 23 Dec 2010 13:06:10 -0500 (EST) vous écriviez:
>
>> http://home.comcast.net/~jpiszcz/20101223/final.html
>>

Please check the updated page:
http://home.comcast.net/~jpiszcz/20101223/final.html

Using a partition shows a slight degredation in the re-write speed but
an increase in performance for sequential output and input with the mode
set to perform.  Looks like this is what I will be using as it is the fastest
speeds overall except for the rewrite.

Thanks!

Justin.

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 19:07                 ` Justin Piszcz
@ 2010-12-23 19:54                   ` Stan Hoeppner
  2010-12-23 21:48                     ` Emmanuel Florac
  2010-12-23 21:50                   ` Emmanuel Florac
  1 sibling, 1 reply; 27+ messages in thread
From: Stan Hoeppner @ 2010-12-23 19:54 UTC (permalink / raw)
  To: xfs

Justin Piszcz put forth on 12/23/2010 1:07 PM:

> Main wonder I have is why when the partition is aligned to 1MiB, which is
> the default in parted 2.2+ I believe, is it slower than with no partitions?

Best guess?  Those 3TB Hitachi drives use 512 byte translated native 4KB
sectors.  The 9750-24 ie card doesn't know how to properly align
partitions on such drives, and/or you're using something other than
fdisk or parted to create your partitions.  Currently these are the only
two partitioners that can align partitions properly on 512 byte
translated/native 4KB sector drives.  Thus you're taking a performance
hit, same as with the WD "Advanced Format" drives which have 512 byte
translated/native 4KB sectors.

If you want maximum performance with least configuration headaches,
avoid 512B/4KB sector hybrid drives.  If you _need_ maximum drive
capacity, live with the warts, or jump through hoops to get the
partitions aligned, or, live without partitions if you can.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 19:29                 ` Justin Piszcz
@ 2010-12-23 19:58                   ` Stan Hoeppner
  2010-12-24  1:01                   ` Stan Hoeppner
  1 sibling, 0 replies; 27+ messages in thread
From: Stan Hoeppner @ 2010-12-23 19:58 UTC (permalink / raw)
  To: xfs

Justin Piszcz put forth on 12/23/2010 1:29 PM:

> Using a partition shows a slight degredation in the re-write speed but
> an increase in performance for sequential output and input with the mode
> set to perform.  Looks like this is what I will be using as it is the
> fastest
> speeds overall except for the rewrite.

As Dave mentioned earlier, performance may degrade significantly over
time as the FS grows and ages, compared to running benchies against an
empty filesystem today, especially if your mkfs.xfs parms were off the
mark when creating.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 18:06             ` Justin Piszcz
  2010-12-23 18:55               ` Emmanuel Florac
@ 2010-12-23 21:12               ` Eric Sandeen
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Sandeen @ 2010-12-23 21:12 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

On 12/23/10 12:06 PM, Justin Piszcz wrote:
> 
> On Thu, 23 Dec 2010, Justin Piszcz wrote:
> 
> 
> Hi,
> 
> How come parted using (optimal at 1mb alignment is slower than no
> partition?  

because parted got it wrong, sounds like.

> In addition, sunit and swidth set properly as mentioned
> earlier appears to be _slower_ than defaults with no partitions.

stripe unit over an incorrectly aligned partition won't help
and I suppose could make it worse.

align your partitions, using sector units, to a stripe width unit.
Set the stripe width properly on the fs on top of that.

-Eric
 
> 
> http://home.comcast.net/~jpiszcz/20101223/final.html
> 
> Justin.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 19:54                   ` Stan Hoeppner
@ 2010-12-23 21:48                     ` Emmanuel Florac
  2010-12-23 23:21                       ` Stan Hoeppner
  0 siblings, 1 reply; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23 21:48 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

Le Thu, 23 Dec 2010 13:54:14 -0600 vous écriviez:

> Best guess?  Those 3TB Hitachi drives use 512 byte translated native
> 4KB sectors. 

Yes, I'm sure that no new drive model comes with true 512B sectors
anymore...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 19:07                 ` Justin Piszcz
  2010-12-23 19:54                   ` Stan Hoeppner
@ 2010-12-23 21:50                   ` Emmanuel Florac
  2010-12-23 22:04                     ` Justin Piszcz
  1 sibling, 1 reply; 27+ messages in thread
From: Emmanuel Florac @ 2010-12-23 21:50 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: xfs

Le Thu, 23 Dec 2010 14:07:13 -0500 (EST) vous écriviez:

> Main wonder I have is why when the partition is aligned to 1MiB,
> which is the default in parted 2.2+ I believe, is it slower than with
> no partitions?

1MiB possibly can't round well on the stripe boundaries. I suppose you
could get better results with 64KB or 16KB stripes. Did you try with an
LVM in between?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 21:50                   ` Emmanuel Florac
@ 2010-12-23 22:04                     ` Justin Piszcz
  0 siblings, 0 replies; 27+ messages in thread
From: Justin Piszcz @ 2010-12-23 22:04 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2185 bytes --]

Hi,

I had not tested with 64KB or 16KB stripes:

I used optimal (default I believe) in newer 2.2+ parted:

        -a alignment-type, --align alignment-type
               Set  alignment  for  newly  created  partitions, valid alignment
               types are:

               none   Use the minimum alignment allowed by the disk type.

               cylinder
                      Align partitions to cylinders.

               minimal
                      Use minimum alignment  as  given  by  the  disk  topology
                      information.  This  and  the  opt  value  will use layout
                      information provided by the disk  to  align  the  logical
                      partition  table  addresses  to actual physical blocks on
                      the disks.  The min value is the minimum aligment  needed
                      to align the partition properly to physical blocks, which
                      avoids performance degradation.

               optimal
                      Use optimum alignment  as  given  by  the  disk  topology
                      information.  This  aligns  to a multiple of the physical
                      block size in a way that guarantees optimal performance.

I'm happy with the performance now.. I get 16GB ram tomorrow so hopefully 
that'll be enough if I need to xfs_repair.

Justin.


On Thu, 23 Dec 2010, Emmanuel Florac wrote:

> Le Thu, 23 Dec 2010 14:07:13 -0500 (EST) vous écriviez:
>
>> Main wonder I have is why when the partition is aligned to 1MiB,
>> which is the default in parted 2.2+ I believe, is it slower than with
>> no partitions?
>
> 1MiB possibly can't round well on the stripe boundaries. I suppose you
> could get better results with 64KB or 16KB stripes. Did you try with an
> LVM in between?
>
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
>

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 21:48                     ` Emmanuel Florac
@ 2010-12-23 23:21                       ` Stan Hoeppner
  0 siblings, 0 replies; 27+ messages in thread
From: Stan Hoeppner @ 2010-12-23 23:21 UTC (permalink / raw)
  To: xfs

Emmanuel Florac put forth on 12/23/2010 3:48 PM:
> Le Thu, 23 Dec 2010 13:54:14 -0600 vous écriviez:
> 
>> Best guess?  Those 3TB Hitachi drives use 512 byte translated native
>> 4KB sectors. 
> 
> Yes, I'm sure that no new drive model comes with true 512B sectors
> anymore...

I believe most/all shipping drives of 1TB and smaller still have native
512 byte sectors, dependent on specific vendor/model line of course.
It's mainly the 1.5TB and up drives with the hybrid 512/4096 byte sector
abomination.

It would be far more optimal if they'd just ship native 4K sector drives
wouldn't it?  Isn't most of Linux already patched for pure 4k sector
drives?  Is XFS ready for such native 4k sector drives?  Are the various
RAID cards/SAN array controllers?

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Anyone using XFS in production on > 20TiB volumes?
  2010-12-23 19:29                 ` Justin Piszcz
  2010-12-23 19:58                   ` Stan Hoeppner
@ 2010-12-24  1:01                   ` Stan Hoeppner
  1 sibling, 0 replies; 27+ messages in thread
From: Stan Hoeppner @ 2010-12-24  1:01 UTC (permalink / raw)
  To: xfs

Justin Piszcz put forth on 12/23/2010 1:29 PM:

> Please check the updated page:
> http://home.comcast.net/~jpiszcz/20101223/final.html
> 
> Using a partition shows a slight degredation in the re-write speed but
> an increase in performance for sequential output and input with the mode
> set to perform.  Looks like this is what I will be using as it is the
> fastest
> speeds overall except for the rewrite.

If your primary workloads for this array are mostly single user/thread
streaming writes/reads then this may be fine.  If they are multi-user or
multi-threaded random re-write server loads, re-write is the most
important metric and you should optimize for that scenario alone, as its
performance is most dramatically impacted by parity RAID schemes such as
RAID 6.

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-12-24  0:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-22 16:30 Anyone using XFS in production on > 20TiB volumes? Justin Piszcz
2010-12-22 16:56 ` Emmanuel Florac
2010-12-22 19:03   ` Eric Sandeen
2010-12-23  0:26     ` Emmanuel Florac
2010-12-23  0:28       ` Justin Piszcz
2010-12-23  0:56         ` Dave Chinner
2010-12-23  9:43           ` Justin Piszcz
2010-12-23 12:03             ` Emmanuel Florac
2010-12-23 18:06             ` Justin Piszcz
2010-12-23 18:55               ` Emmanuel Florac
2010-12-23 19:07                 ` Justin Piszcz
2010-12-23 19:54                   ` Stan Hoeppner
2010-12-23 21:48                     ` Emmanuel Florac
2010-12-23 23:21                       ` Stan Hoeppner
2010-12-23 21:50                   ` Emmanuel Florac
2010-12-23 22:04                     ` Justin Piszcz
2010-12-23 19:29                 ` Justin Piszcz
2010-12-23 19:58                   ` Stan Hoeppner
2010-12-24  1:01                   ` Stan Hoeppner
2010-12-23 21:12               ` Eric Sandeen
2010-12-23  1:10         ` Emmanuel Florac
2010-12-22 17:06 ` Chris Wedgwood
2010-12-22 17:10   ` Justin Piszcz
2010-12-22 17:32     ` Chris Wedgwood
2010-12-22 17:35       ` Justin Piszcz
2010-12-22 18:50         ` Chris Wedgwood
2010-12-22 19:24           ` Justin Piszcz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox