Safe XFS limits (100TB+)

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Safe XFS limits (100TB+)
@ 2017-02-02 16:46 fuser ct1
  2017-02-02 17:16 ` Eric Sandeen
  2017-02-02 18:16 ` Emmanuel Florac
  0 siblings, 2 replies; 7+ messages in thread
From: fuser ct1 @ 2017-02-02 16:46 UTC (permalink / raw)
  To: linux-xfs

Hello list.

Despite searching I couldn't find guidance, or many use cases, regarding
XFS beyond 100TB.

Of course the filesystem limits are way beyond this, but I was looking for
real world experiences...

Specifically I'm wondering about the sanity of using XFS with a couple of
144TB block devices (my system will have two 22x8TB R60 in a 44 bay JBOD).
My storage is used for video editing/post production.

* Has anybody here tried?
* What is the likelihood of xfs_repair/check finishing if I ever needed to
run it?
* Am I nuts?

I know that beyond a certain point I should be looking at a scale out
option, but the level of complexity and cost goes up considerably.

More info:
=======

Previously I had an 80TB usable (96TB raw) with an LSI MegaRAID 9361
controller. This worked very nicely and was FAST. I was careful to choose
the inode64 fstab mount option. The OS was Debian Jessie, which has
XFSPROGS version 3.2.1.

Thanks in advance and sorry if this is not the right list.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
  2017-02-02 16:46 Safe XFS limits (100TB+) fuser ct1
@ 2017-02-02 17:16 ` Eric Sandeen
  2017-02-02 17:52   ` fuser ct1
  2017-02-02 18:16 ` Emmanuel Florac
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2017-02-02 17:16 UTC (permalink / raw)
  To: fuser ct1, linux-xfs

On 2/2/17 10:46 AM, fuser ct1 wrote:
> Hello list.
> 
> Despite searching I couldn't find guidance, or many use cases, regarding
> XFS beyond 100TB.
> 
> Of course the filesystem limits are way beyond this, but I was looking for
> real world experiences...
> 
> Specifically I'm wondering about the sanity of using XFS with a couple of
> 144TB block devices (my system will have two 22x8TB R60 in a 44 bay JBOD).
> My storage is used for video editing/post production.
> 
> * Has anybody here tried?

XFS has been used well past 100T, sure.

> * What is the likelihood of xfs_repair/check finishing if I ever needed to
> run it?

xfs_check no, but it's deprecated anyway because it doesn't scale.

xfs_repair yes, though the amount of resources needed will depend on
the details of how you populate the filesystem.

On my puny celeron with 8g ram, xfs_repair of an empty 288T image file
takes 2 seconds.  Filling it with files will change this :)
But if it's for video editing I presume you will actually be fairly
light on the metadata, with a not-insane number of inodes, and very
large files.

But you do want to make sure that the machine administering the filesystem
is fairly beefy, for xfs_repair purposes.

http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F

> * Am I nuts?

Probably not.   :)

-Eric

> I know that beyond a certain point I should be looking at a scale out
> option, but the level of complexity and cost goes up considerably.
> 
> More info:
> =======
> 
> Previously I had an 80TB usable (96TB raw) with an LSI MegaRAID 9361
> controller. This worked very nicely and was FAST. I was careful to choose
> the inode64 fstab mount option. The OS was Debian Jessie, which has
> XFSPROGS version 3.2.1.
> 
> Thanks in advance and sorry if this is not the right list.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
  2017-02-02 17:16 ` Eric Sandeen
@ 2017-02-02 17:52   ` fuser ct1
  2017-02-02 17:55     ` Eric Sandeen
  0 siblings, 1 reply; 7+ messages in thread
From: fuser ct1 @ 2017-02-02 17:52 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Thanks for the fast reply Eric!

It's good to know I'm not completely off my chop :-)

288T! Wow, OK. Is that safe and workable when it starts to get full?
(say around 70% utilized)

In this case the machine will have 128G memory and a dual E5 Xeon. The
filesystem will have a fair number of files, but they're all going to
be a fair few G.

Regarding xfs_repair/check, thanks for the heads up. I've only had to
use repair once due to some mess caused by unsupported with an
unsupported firmware with an Adaptec 71605 (learned my lesson quickly
there).

I wonder if my eyes are too big for my stomach now? Why have 144TB x2
if you can have one big filesystem...

My fstab options are as such:

rw,nobarrier,inode64

Also, FWIW I didn't partition the device last time, just ended up with...

/dev/sdb: UUID="ccac4134-12a0-4dbd-9365-d2e166d927ed" TYPE="xfs"


On Thu, Feb 2, 2017 at 5:16 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 2/2/17 10:46 AM, fuser ct1 wrote:
>> Hello list.
>>
>> Despite searching I couldn't find guidance, or many use cases, regarding
>> XFS beyond 100TB.
>>
>> Of course the filesystem limits are way beyond this, but I was looking for
>> real world experiences...
>>
>> Specifically I'm wondering about the sanity of using XFS with a couple of
>> 144TB block devices (my system will have two 22x8TB R60 in a 44 bay JBOD).
>> My storage is used for video editing/post production.
>>
>> * Has anybody here tried?
>
> XFS has been used well past 100T, sure.
>
>> * What is the likelihood of xfs_repair/check finishing if I ever needed to
>> run it?
>
> xfs_check no, but it's deprecated anyway because it doesn't scale.
>
> xfs_repair yes, though the amount of resources needed will depend on
> the details of how you populate the filesystem.
>
> On my puny celeron with 8g ram, xfs_repair of an empty 288T image file
> takes 2 seconds.  Filling it with files will change this :)
> But if it's for video editing I presume you will actually be fairly
> light on the metadata, with a not-insane number of inodes, and very
> large files.
>
> But you do want to make sure that the machine administering the filesystem
> is fairly beefy, for xfs_repair purposes.
>
> http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F
>
>> * Am I nuts?
>
> Probably not.   :)
>
> -Eric
>
>> I know that beyond a certain point I should be looking at a scale out
>> option, but the level of complexity and cost goes up considerably.
>>
>> More info:
>> =======
>>
>> Previously I had an 80TB usable (96TB raw) with an LSI MegaRAID 9361
>> controller. This worked very nicely and was FAST. I was careful to choose
>> the inode64 fstab mount option. The OS was Debian Jessie, which has
>> XFSPROGS version 3.2.1.
>>
>> Thanks in advance and sorry if this is not the right list.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
  2017-02-02 17:52   ` fuser ct1
@ 2017-02-02 17:55     ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2017-02-02 17:55 UTC (permalink / raw)
  To: fuser ct1; +Cc: linux-xfs

On 2/2/17 11:52 AM, fuser ct1 wrote:
> Thanks for the fast reply Eric!
> 
> It's good to know I'm not completely off my chop :-)
> 
> 288T! Wow, OK. Is that safe and workable when it starts to get full?
> (say around 70% utilized)
> 
> In this case the machine will have 128G memory and a dual E5 Xeon. The
> filesystem will have a fair number of files, but they're all going to
> be a fair few G.

That's probably not unreasonable.  You can always feed it swap and wait
longer, if you have to.

> Regarding xfs_repair/check, thanks for the heads up. I've only had to
> use repair once due to some mess caused by unsupported with an
> unsupported firmware with an Adaptec 71605 (learned my lesson quickly
> there).
> 
> I wonder if my eyes are too big for my stomach now? Why have 144TB x2
> if you can have one big filesystem...

Just more to lose all at once if something /does/ go wrong.

And yes, storage behaving properly is critical to things not going
wrong...

> My fstab options are as such:
> 
> rw,nobarrier,inode64

inode64 is the default upstream.
nobarrier should never be used, and is deprecated upstream.

> Also, FWIW I didn't partition the device last time, just ended up with...
> 
> /dev/sdb: UUID="ccac4134-12a0-4dbd-9365-d2e166d927ed" TYPE="xfs"

xfs doesn't care what block device it lives on, though some misbehaving
utilities might stomp on it if it looks unpartitioned.

-Eric

> 
> On Thu, Feb 2, 2017 at 5:16 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/2/17 10:46 AM, fuser ct1 wrote:
>>> Hello list.
>>>
>>> Despite searching I couldn't find guidance, or many use cases, regarding
>>> XFS beyond 100TB.
>>>
>>> Of course the filesystem limits are way beyond this, but I was looking for
>>> real world experiences...
>>>
>>> Specifically I'm wondering about the sanity of using XFS with a couple of
>>> 144TB block devices (my system will have two 22x8TB R60 in a 44 bay JBOD).
>>> My storage is used for video editing/post production.
>>>
>>> * Has anybody here tried?
>>
>> XFS has been used well past 100T, sure.
>>
>>> * What is the likelihood of xfs_repair/check finishing if I ever needed to
>>> run it?
>>
>> xfs_check no, but it's deprecated anyway because it doesn't scale.
>>
>> xfs_repair yes, though the amount of resources needed will depend on
>> the details of how you populate the filesystem.
>>
>> On my puny celeron with 8g ram, xfs_repair of an empty 288T image file
>> takes 2 seconds.  Filling it with files will change this :)
>> But if it's for video editing I presume you will actually be fairly
>> light on the metadata, with a not-insane number of inodes, and very
>> large files.
>>
>> But you do want to make sure that the machine administering the filesystem
>> is fairly beefy, for xfs_repair purposes.
>>
>> http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F
>>
>>> * Am I nuts?
>>
>> Probably not.   :)
>>
>> -Eric
>>
>>> I know that beyond a certain point I should be looking at a scale out
>>> option, but the level of complexity and cost goes up considerably.
>>>
>>> More info:
>>> =======
>>>
>>> Previously I had an 80TB usable (96TB raw) with an LSI MegaRAID 9361
>>> controller. This worked very nicely and was FAST. I was careful to choose
>>> the inode64 fstab mount option. The OS was Debian Jessie, which has
>>> XFSPROGS version 3.2.1.
>>>
>>> Thanks in advance and sorry if this is not the right list.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
  2017-02-02 16:46 Safe XFS limits (100TB+) fuser ct1
  2017-02-02 17:16 ` Eric Sandeen
@ 2017-02-02 18:16 ` Emmanuel Florac
  2017-02-02 19:14   ` Martin Steigerwald
       [not found]   ` <CAL8yqih36vWy-Z1PESVZOqDEoW8G9=k5LBM0aToe4JhBM755bA@mail.gmail.com>
  1 sibling, 2 replies; 7+ messages in thread
From: Emmanuel Florac @ 2017-02-02 18:16 UTC (permalink / raw)
  To: fuser ct1; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1920 bytes --]

Le Thu, 2 Feb 2017 16:46:09 +0000
fuser ct1 <fuserct1@gmail.com> écrivait:

> Hello list.
> 
> Despite searching I couldn't find guidance, or many use cases,
> regarding XFS beyond 100TB.
> 
> Of course the filesystem limits are way beyond this, but I was
> looking for real world experiences...

I manage and support several hosts I built and set up, some running for
many years, with very large XFS volumes.
Recent XFS volumes with XFS v5 seem to promise even more robustness,
thanks to metadata checksums.

Currently in use under heavy load machines with the following usable
volumes, almost all of them using RAID 60 (21 to 28 drives x 2 or x3):

1 490 TB volume
3 390 TB volumes
1 240 TB volume 
2 180 TB volumes 
5 160 TB volumes 
11 120 TB volumes
4 90 TB volumes
14 77 TB volumes
many, many 50 and 40 TB volumes.

2x22 disks Raid 60 is perfectly OK, as long as you're using good disks.
I only use HGST, and have a failure rate so low I don't even bother
tracking it precisely anymore (like 2 or 3 failures a year among the
couple thousands disks listed above). 

Use recent xfs progs and kernel, use xfs v5 if possible. Don't forget
proper optimisations (use noop scheduler, enlarge nr_requests and
read_ahead_kb a lot) for high sequential throughput (video is all about
sequential throughput) and you should be happy and safe.

xfs_repair on a filled fast 100 TB volume only needs 15 minutes or so.
And it was after a very, very bad power event (someone connected a
studio light to the UPS and brought everything down literally in
flames).

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
  2017-02-02 18:16 ` Emmanuel Florac
@ 2017-02-02 19:14   ` Martin Steigerwald
       [not found]   ` <CAL8yqih36vWy-Z1PESVZOqDEoW8G9=k5LBM0aToe4JhBM755bA@mail.gmail.com>
  1 sibling, 0 replies; 7+ messages in thread
From: Martin Steigerwald @ 2017-02-02 19:14 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: fuser ct1, linux-xfs

Am Donnerstag, 2. Februar 2017, 19:16:27 CET schrieb Emmanuel Florac:
> Use recent xfs progs and kernel, use xfs v5 if possible. Don't forget
> proper optimisations (use noop scheduler, enlarge nr_requests and
> read_ahead_kb a lot) for high sequential throughput (video is all about
> sequential throughput) and you should be happy and safe.

Just adding some Debian hints:

For Debian Jessie that means backport kernel – that means 4.8 currently. There 
is not backport of xfsprogs available tough and 3.2 is pretty old. I am not 
sure since when xfsprogs was switched to create xfs v5 by default – so maybe 
the proper options to activate xfs v5 are needed. There is always the option 
to compile xfsprogs yourself, or well just start out with Debian Testing as 
that in the process of being frozen more and more before release of Debian 
Stretch. I would be surprised of any major hichups regarding XFS in Debian 
Testing before the release. It will very likely have kernel 4.9 and xfsprogs 
4.9.

Thanks,
-- 
Martin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Safe XFS limits (100TB+)
       [not found]   ` <CAL8yqih36vWy-Z1PESVZOqDEoW8G9=k5LBM0aToe4JhBM755bA@mail.gmail.com>
@ 2017-02-03 17:10     ` Emmanuel Florac
  0 siblings, 0 replies; 7+ messages in thread
From: Emmanuel Florac @ 2017-02-03 17:10 UTC (permalink / raw)
  To: fuser ct1, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 3695 bytes --]

Le Thu, 2 Feb 2017 18:48:50 +0000
fuser ct1 <fuserct1@gmail.com> écrivait:

> >I manage and support several hosts I built and set up, some running
> >for many years, with very large XFS volumes.
> >Recent XFS volumes with XFS v5 seem to promise even more robustness,
> >thanks to metadata checksums.  
> 
> Thanks this is good to know, although I think the distributions I use
> are at latest running 4.3.0+nmu1ubuntu1 for Ubuntu 16.04. Might go
> fishing in backports though.

4.3 should be good. XFS v5 requires at least x3.16.

> The checksum idea is interesting, I'll have a read - having worked
> with ZFS for some time too, it'll be interesting to see how this
> feature compares.

It's only metadata checksumming in XFS. Much faster (but of course less
safe; however you can scrub using the RAID controller, instead).

> >Currently in use under heavy load machines with the following usable
> >volumes, almost all of them using RAID 60 (21 to 28 drives x 2 or
> >x3):
> >
> >1 490 TB volume
> >3 390 TB volumes
> >1 240 TB volume
> >2 180 TB volumes
> >5 160 TB volumes
> >11 120 TB volumes
> >4 90 TB volumes
> >14 77 TB volumes
> >many, many 50 and 40 TB volumes.  
> 
> The 390TB thing looks tempting. With this LSI one could probably do 1x
> logical volume comprised of two spans of 22x R60, which would yield
> something like 288TB usable.

No, these are USABLE volumes. 390 TB is the usable volume of a 60
8TB drives chassis (480 TB), splitted in 2 x 29 drives + 2 spares. 

On most systems I use 2 controllers (one per array) for higher
performance (though it doesn't make that much of a difference with the
last generation).

> >2x22 disks Raid 60 is perfectly OK, as long as you're using good
> >disks. I only use HGST, and have a failure rate so low I don't even
> >bother tracking it precisely anymore (like 2 or 3 failures a year
> >among the couple thousands disks listed above).  
> 
> I've planned for 7K6 Ultrastar's. The HGST never give me much trouble.
> Sometimes I've had dead ones upon init, but that pretty normal I
> guess.

As the latest Backblaze report shows, not all Seagate drives are
bad, however all terrible hard disks  models come from Seagate...

> >Use recent xfs progs and kernel, use xfs v5 if possible. Don't forget
> >proper optimisations (use noop scheduler, enlarge nr_requests and
> >read_ahead_kb a lot) for high sequential throughput (video is all
> >about sequential throughput) and you should be happy and safe.  
> 
> Normally using NOOP, 1024 nr_requests and 8196 read ahead.

Good :)

> >xfs_repair on a filled fast 100 TB volume only needs 15 minutes or
> >so. And it was after a very, very bad power event (someone connected
> >a studio light to the UPS and brought everything down literally in
> >flames).  
> 
> Thanks that's really helpful to have a frame of reference!

It used to be much worse a few years back when xfs_repair demanded
gobbled RAM. I remember setting up additional swap space on USB drives
to be able to repair... That was wayyyyy slower back then :)

Given you have enough memory (32G or more), nowadays xfs_repair on a
huge filesystem is a breeze, even with gazillions of files (like DPX
or EXR images sequences....).

[I'm cc'ing  to the list because the information may help someone else
someday :) 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-02-03 17:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-02 16:46 Safe XFS limits (100TB+) fuser ct1
2017-02-02 17:16 ` Eric Sandeen
2017-02-02 17:52   ` fuser ct1
2017-02-02 17:55     ` Eric Sandeen
2017-02-02 18:16 ` Emmanuel Florac
2017-02-02 19:14   ` Martin Steigerwald
     [not found]   ` <CAL8yqih36vWy-Z1PESVZOqDEoW8G9=k5LBM0aToe4JhBM755bA@mail.gmail.com>
2017-02-03 17:10     ` Emmanuel Florac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).