public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs + 100TB+ storage + lots of small files + NFS
@ 2016-07-09 11:14 Marcin Sura
  2016-07-10  9:24 ` Ric Wheeler
  2016-07-10 23:48 ` Dave Chinner
  0 siblings, 2 replies; 3+ messages in thread
From: Marcin Sura @ 2016-07-09 11:14 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2171 bytes --]

Hi,

Friend of mine asked me about evaluation of XFS for their purposes.
Currently I don't have physical access to their system, but here are the
info I've got so far:

SAN:
- physical storage is from FSC array, thin provisioned raid 6 volume,
- volumes are 100TB+ in size
- there are SSD disks in the array, which potentially can be used for
journal
- storage is connected to the host via 10GbE iSCSI

Host:
- They are using CentOS 6.5, with stock kernel 2.6.32-*
- System uses all default values, no optimization has beed done
- OS installed on SSD
- Don't know exact details of CPU, but I assume some recent multicore CPU
- Don't know amount of RAM installed, I assume 32GB+

NFS:
- they are exporting filesystem via NFS to 10-20 clients (services), some
VMs, some bare metal
- clients are connected via 1GbE or 10GbE links

Workload:
- they are storing tens or hundreds of millions of small files
- files are not in single directory
- files are undek 1K, usually 200 - 500 bytes
- I assume, that some NFS clients constantly write files
- some NFS clients initiates massive reads, millions of random files
- those reads are on demand, but during peak hours there can be many of
such requests

So far they were using Ext4, after some basic test they observed 40%
improvement in application counters. But I'm afraid that those tests were
done in environment not even close to the production (not so big size of
filesystem, not so much files).

I want to ask you what would be best mkfs.xfs settings for such setup.

I assume, that they should use inode64 mount option for such large
filesystem with that amount of files, but I'm a bit worried about
compatibility with NFS (default shipped with CentOS 6.5). I think inode32
is totally out of scope here.

Any other hints for setting this stuff up?
Probably some recent OS/kernel would also help a lot, right?

Also, do you know any benchmark which can be used to simulate such
workload? I've googled a lot, but there is quite short list of
multi-threaded, small files oriented benchmarks. To be honest, I've found
only https://github.com/bengland2/smallfile to be close to what I need. Any
other alternatives?

BR
Marcin

[-- Attachment #1.2: Type: text/html, Size: 2808 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs + 100TB+ storage + lots of small files + NFS
  2016-07-09 11:14 xfs + 100TB+ storage + lots of small files + NFS Marcin Sura
@ 2016-07-10  9:24 ` Ric Wheeler
  2016-07-10 23:48 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Ric Wheeler @ 2016-07-10  9:24 UTC (permalink / raw)
  To: Marcin Sura, xfs

On 07/09/2016 02:14 PM, Marcin Sura wrote:
> Hi,
>
> Friend of mine asked me about evaluation of XFS for their purposes. Currently 
> I don't have physical access to their system, but here are the info I've got 
> so far:
>
> SAN:
> - physical storage is from FSC array, thin provisioned raid 6 volume,
> - volumes are 100TB+ in size
> - there are SSD disks in the array, which potentially can be used for journal
> - storage is connected to the host via 10GbE iSCSI
>
> Host:
> - They are using CentOS 6.5, with stock kernel 2.6.32-*
> - System uses all default values, no optimization has beed done
> - OS installed on SSD
> - Don't know exact details of CPU, but I assume some recent multicore CPU
> - Don't know amount of RAM installed, I assume 32GB+
>
> NFS:
> - they are exporting filesystem via NFS to 10-20 clients (services), some VMs, 
> some bare metal
> - clients are connected via 1GbE or 10GbE links
>
> Workload:
> - they are storing tens or hundreds of millions of small files
> - files are not in single directory
> - files are undek 1K, usually 200 - 500 bytes
> - I assume, that some NFS clients constantly write files
> - some NFS clients initiates massive reads, millions of random files
> - those reads are on demand, but during peak hours there can be many of such 
> requests
>
> So far they were using Ext4, after some basic test they observed 40% 
> improvement in application counters. But I'm afraid that those tests were done 
> in environment not even close to the production (not so big size of 
> filesystem, not so much files).
>
> I want to ask you what would be best mkfs.xfs settings for such setup.
>
> I assume, that they should use inode64 mount option for such large filesystem 
> with that amount of files, but I'm a bit worried about compatibility with NFS 
> (default shipped with CentOS 6.5). I think inode32 is totally out of scope here.
>
> Any other hints for setting this stuff up?
> Probably some recent OS/kernel would also help a lot, right?
>
> Also, do you know any benchmark which can be used to simulate such workload? 
> I've googled a lot, but there is quite short list of multi-threaded, small 
> files oriented benchmarks. To be honest, I've found only 
> https://github.com/bengland2/smallfile to be close to what I need. Any other 
> alternatives?
>
> BR
> Marcin

I think that is a good test to explore - Ben wrote that for exactly this kind of 
workload.

For a single system (i.e., performance a single NFS client or local file 
system), you could also test using fs_mark.

Regards,
Ric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfs + 100TB+ storage + lots of small files + NFS
  2016-07-09 11:14 xfs + 100TB+ storage + lots of small files + NFS Marcin Sura
  2016-07-10  9:24 ` Ric Wheeler
@ 2016-07-10 23:48 ` Dave Chinner
  1 sibling, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2016-07-10 23:48 UTC (permalink / raw)
  To: Marcin Sura; +Cc: xfs

On Sat, Jul 09, 2016 at 01:14:37PM +0200, Marcin Sura wrote:
> Hi,
> 
> Friend of mine asked me about evaluation of XFS for their purposes.
> Currently I don't have physical access to their system, but here are the
> info I've got so far:
> 
> SAN:
> - physical storage is from FSC array, thin provisioned raid 6 volume,
> - volumes are 100TB+ in size
> - there are SSD disks in the array, which potentially can be used for
> journal
> - storage is connected to the host via 10GbE iSCSI
> 
> Host:
> - They are using CentOS6.5, with stock kernel 2.6.32-*

I'd suggest the NFS server should use a kernel/distro as recent as
possible. Doesn't affect client/application side OS choices, so
I would suggest, at minimum, you use a kernel that supports metadata
CRCs. You'll have hundreds of terabytes of data indexed by hundreds
of gigabytes of metadata, and you're going to want things like free
inode indexing to keep inode allocation as fast as possible as
counts build up to the hundreds of millions of inodes.

> - System uses all default values, no optimization has beed done
> - OS installed on SSD
> - Don't know exact details of CPU, but I assume some recent multicore CPU
> - Don't know amount of RAM installed, I assume 32GB+

With a peaky random read workload, you're going to want to cache
tens of millions of inodes in RAM to get performance out of the
machine. RAM is cheap compared to storage costs - I'd suggest
hundreds of GB of RAM in the server....

> NFS:
> - they are exporting filesystem via NFS to 10-20 clients (services), some
> VMs, some bare metal
> - clients are connected via 1GbE or 10GbE links
> 
> Workload:
> - they are storing tens or hundreds of millions of small files
> - files are not in single directory

How big are the directories?

> - files are undek 1K, usually 200 - 500 bytes
> - I assume, that some NFS clients constantly write files
> - some NFS clients initiates massive reads, millions of random files
> - those reads are on demand, but during peak hours there can be many of
> such requests

This sort of "efficiently indexing hundreds of millions of tiny
objects" workload is what databases were designed for, not
filesystems. Yes, you can use a filesystem for this sort of
workload, but IMO it's not the right tool for this job.

> So far they were using Ext4, after some basic test they observed 40%
> improvement in application counters. But I'm afraid that those tests were
> done in environment not even close to the production (not so big size of
> filesystem, not so much files).
> 
> I want to ask you what would be best mkfs.xfs settings for such setup.

How long is a piece of string?

Working out how to optimise storage to this sort of workload
requires an iterative measure/analyse/tweak approach. Anything else
is just guesswork. i.e. start with the defaults, then measure
performance, identify the bottlenecks in the system and then tweak
the appropriate knob to alleviate the bottleneck.

i.e. you may find that there are things you have to change in the
NFS server config to get it to scale before you even start looking
at XFS performance....

> I assume, that they should use inode64 mount option for such large
> filesystem with that amount of files, but I'm a bit worried about
> compatibility with NFS (default shipped with CentOS 6.5). I think inode32
> is totally out of scope here.

inode32 will not support hundreds of millions of inodes - you'll
ENOSPC the first AG long before that, and performance will be very
bad as all inode/directory allocation will single thread. And it
will only get worse as the inode count goes up.

As it is, inode64 will be fine for 64bit NFS clients. It's only 32
bit clients that have problems with 64 bit inode numbers, and even
then it is only a problem on older linux and non-linux clients.

> Also, do you know any benchmark which can be used to simulate such
> workload?

Test against your production workload. It's the only way to be sure
you are optimising the right things.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-07-10 23:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-09 11:14 xfs + 100TB+ storage + lots of small files + NFS Marcin Sura
2016-07-10  9:24 ` Ric Wheeler
2016-07-10 23:48 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox