public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS: performance
@ 2010-11-28 22:51 Yclept Nemo
  2010-11-29  0:11 ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Yclept Nemo @ 2010-11-28 22:51 UTC (permalink / raw)
  To: xfs

After 3-4 years of using one XFS partition for every mount point
(/,/usr,/etc,/home,/tmp...) I started noticing a rapid performance
degradation. Subjectively I now feel my XFS partition is 5-10x slower
... while other partitions (ntfs,ext3) remain the same.
Now I just purchased a new hard-drive and I'm going to be copying all
my original files over onto a *new* XFS partition. When using mkfs.xfs
I'd like to optimize and avoid whatever it was that made my old XFS
partition slower than a snail.

in all cases, bsize=4096

Original hard drive (98GB Seagate, 39.8Gb XFS partition)
# xfs_info /dev/sdb5
meta-data=/dev/sdb5
    isize=256
    agcount=4, agsize=2595896 blks (9.9 GB)
    sectsz=512    attr=2
data=
    bsize=4096
    blocks=10383582, imaxpct=25 (39.6 GB)
    sunit=0    swidth=0 blks
naming=version 2
    bsize=4096   ascii-ci=0
log=internal
    bsize=4096
    blocks=5070, version=2 (19.8 MB)
    sectsz=512
    sunit=0 blks, lazy-count=0
realtime =none
    extsz=4096
    blocks=0, rtextents=0


New hard drive (500GB Samsung MP4, 100GB XFS partition)
# mkfs.xfs /dev/sda5
meta-data=/dev/sda5
    isize=256
    agcount=4, agsize=6553600 blks (25.0 GB)
    sectsz=512    attr=2
data=
    bsize=4096
    blocks=26214400, imaxpct=25 (100.0 GB)
    sunit=0    swidth=0 blks
naming=version 2
    bsize=4096   ascii-ci=0
log=internal
    bsize=4096
    blocks=12800, version=2 (50MB)
    sectsz=512
    sunit=0 blks, lazy-count=1
realtime=none
    extsz=4096
    blocks=0, rtextents=0

I was considering running "mkfs.xfs -d agcount=32 -i attr=2 -l
version=2,lazy-count=1,size=256m /dev/sda5".

Yes, I know that in xfs_progs 3.1.3 "-i attr=2 -l
version=2,lazy-count=1" are already default options. However I think I
should tweak the log size, blocksize, and data allocation group counts
beyond the default values and I'm looking for some recommendations or
input.

I assume mkfs.xfs automatically selects optimal values, but I *have*
space to spare for a larger log section... and perhaps my old XFS
partition became sluggish when the log section had filled up, if this
is even possible.
Similarly a larger agcount should always give better performance,
right? Some resources claim that agcount should never fall below
eight.
I'm also hesitant about reducing the blocksize from a maximum of 4096
bytes, but since XFS manages my entire file-system tree, a blocksize
of 512, 1024,or even 2048 bytes might squeeze out some extra
performance. [I assume] the performance w.r.t. blocksize is:
 . a larger blocksize dramatically increases large file performance,
but also increases space usage when dealing with small files.
. a smaller blocksize dramatically decreases performance for large
files, and somewhat increases performance for small files, while also
slightly increasing space usage with extra inodes(?)

I want to make it clear that I prefer performance over space efficiency.

Many thanks,
orbisvicis

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-28 22:51 XFS: performance Yclept Nemo
@ 2010-11-29  0:11 ` Dave Chinner
  2010-11-29  1:21   ` Yclept Nemo
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-11-29  0:11 UTC (permalink / raw)
  To: Yclept Nemo; +Cc: xfs

On Sun, Nov 28, 2010 at 10:51:04PM +0000, Yclept Nemo wrote:
> After 3-4 years of using one XFS partition for every mount point
> (/,/usr,/etc,/home,/tmp...) I started noticing a rapid performance
> degradation. Subjectively I now feel my XFS partition is 5-10x slower
> ... while other partitions (ntfs,ext3) remain the same.

Can you run some benchmarks to show this non-subjectively? Aged
filesystems will be slower than new filesytsems, and it should be
measurable. Also, knowing what you filesystem contains (number of
files, used capacity, whether you have run it near ENOSPC for
extended periods of time, etc) would help us understand the way the
filesytesm has aged as well.

> Now I just purchased a new hard-drive and I'm going to be copying all
> my original files over onto a *new* XFS partition. When using mkfs.xfs
> I'd like to optimize and avoid whatever it was that made my old XFS
> partition slower than a snail.

Nobody can give definite advice without first quantifying and then
understanding the aging slowdown you've been seeing.

....

> I was considering running "mkfs.xfs -d agcount=32 -i attr=2 -l
> version=2,lazy-count=1,size=256m /dev/sda5".
> 
> Yes, I know that in xfs_progs 3.1.3 "-i attr=2 -l
> version=2,lazy-count=1" are already default options. However I think I
> should tweak the log size, blocksize, and data allocation group counts
> beyond the default values and I'm looking for some recommendations or
> input.

Why do you think you should tweak them?

> I assume mkfs.xfs automatically selects optimal values, but I *have*
> space to spare for a larger log section... and perhaps my old XFS
> partition became sluggish when the log section had filled up, if this
> is even possible.

Well, you had a very small log (20MB) on the original filesystem,
and so as the filesystems ages (e.g. free space fragments), each
allocation/free transaction would be larger than on a new filesytsem
because of the larger btrees that need to be manipulated. With such
a small log, that could be part of the reason for the slowdown you
were seeing.  However, without knowing what you filesystem looks
like physically, this is only speculation.

That being said, the larger log (50MB) that the new filesystem has
shouldn't have the same degree of degradation under the same aging
characteristics. It's probably not necesary to go larger than ~100MB
for partition of 100GB on a single spindle...

> Similarly a larger agcount should always give better performance,
> right?

No.

> Some resources claim that agcount should never fall below
> eight.

If those resources are right, then why would we default to 4 AGs for
filesystems on single spindles?

> I'm also hesitant about reducing the blocksize from a maximum of 4096
> bytes, but since XFS manages my entire file-system tree, a blocksize
> of 512, 1024,or even 2048 bytes might squeeze out some extra
> performance. [I assume] the performance w.r.t. blocksize is:
>  . a larger blocksize dramatically increases large file performance,

No, it doesn't - maybe a few percent difference when you've got
multiple GB/s throughput, but it's mostly noise for single spindles.
Extent based allocation makes block size  pretty much irrelevant for
sequential write performance...

> but also increases space usage when dealing with small files.

Not significantly enough to matter for modern disks.

> . a smaller blocksize dramatically decreases performance for large
> files,

See above.

> and somewhat increases performance for small files,

Not really.

> while also
> slightly increasing space usage with extra inodes(?)

????

> I want to make it clear that I prefer performance over space efficiency.

That's what the defaults are biased towards.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-29  0:11 ` Dave Chinner
@ 2010-11-29  1:21   ` Yclept Nemo
  2010-11-29  1:59     ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Yclept Nemo @ 2010-11-29  1:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Nov 29, 2010 at 12:11 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Sun, Nov 28, 2010 at 10:51:04PM +0000, Yclept Nemo wrote:
>> After 3-4 years of using one XFS partition for every mount point
>> (/,/usr,/etc,/home,/tmp...) I started noticing a rapid performance
>> degradation. Subjectively I now feel my XFS partition is 5-10x slower
>> ... while other partitions (ntfs,ext3) remain the same.
>
> Can you run some benchmarks to show this non-subjectively? Aged
> filesystems will be slower than new filesytsems, and it should be
> measurable. Also, knowing what you filesystem contains (number of
> files, used capacity, whether you have run it near ENOSPC for
> extended periods of time, etc) would help us understand the way the
> filesytesm has aged as well.

Certainly, if you are interested I can run either dbench or bonnie++
tests comparing an XFS partition (with default values from xfsprogs
3.1.3) on the new hard-drive to the existing partition on the old. As
I'm not sure what you're looking for, what command parameters should I
profile against?

The XFS partition in question is 39.61GB in size, of which 30.71GB are
in use (8.90GB free). It contains a typical Arch Linux installation
with many programs and many personal files. Usage pattern as follows:
. equal runtime split between (near ENOSPC) and (approximately 10.0GB free)
. mostly small files, one or two exceptions
. often reach ENOSPC through carelessness
. run xfs_fsr very often

Breakdown of space:
/home: 14.6GB
/usr: 12.3GB
/var: 1005.1 MB
/etc: 58.8 MB

>> Now I just purchased a new hard-drive and I'm going to be copying all
>> my original files over onto a *new* XFS partition. When using mkfs.xfs
>> I'd like to optimize and avoid whatever it was that made my old XFS
>> partition slower than a snail.
>
> Nobody can give definite advice without first quantifying and then
> understanding the aging slowdown you've been seeing.
>
> ....

(see above)

>> I was considering running "mkfs.xfs -d agcount=32 -i attr=2 -l
>> version=2,lazy-count=1,size=256m /dev/sda5".
>>
>> Yes, I know that in xfs_progs 3.1.3 "-i attr=2 -l
>> version=2,lazy-count=1" are already default options. However I think I
>> should tweak the log size, blocksize, and data allocation group counts
>> beyond the default values and I'm looking for some recommendations or
>> input.
>
> Why do you think you should tweak them?

To avoid the aging slowdown as well as to increase read/write/metadata
performance with small files.

>> I assume mkfs.xfs automatically selects optimal values, but I *have*
>> space to spare for a larger log section... and perhaps my old XFS
>> partition became sluggish when the log section had filled up, if this
>> is even possible.
>
> Well, you had a very small log (20MB) on the original filesystem,
> and so as the filesystems ages (e.g. free space fragments), each
> allocation/free transaction would be larger than on a new filesytsem
> because of the larger btrees that need to be manipulated. With such
> a small log, that could be part of the reason for the slowdown you
> were seeing.  However, without knowing what you filesystem looks
> like physically, this is only speculation.
>
> That being said, the larger log (50MB) that the new filesystem has
> shouldn't have the same degree of degradation under the same aging
> characteristics. It's probably not necesary to go larger than ~100MB
> for partition of 100GB on a single spindle...

In this case I'll aim for a large log section, probably 256 or 512MB,
unless it will impede performance. That way there will be no problems
when I resize the partition to 200GB ... 300GB... up to a maximum of
450GB. In fact the manual page of xfs_growfs - which might be outdated
- warns that log resizing is not implemented, so it would probably be
auspicious to create an overly-large log section.

>> Similarly a larger agcount should always give better performance,
>> right?
>
> No.
>
>> Some resources claim that agcount should never fall below
>> eight.
>
> If those resources are right, then why would we default to 4 AGs for
> filesystems on single spindles?

Obviously you are against modifying the agcount - I won't touch it :)

>> I'm also hesitant about reducing the blocksize from a maximum of 4096
>> bytes, but since XFS manages my entire file-system tree, a blocksize
>> of 512, 1024,or even 2048 bytes might squeeze out some extra
>> performance. [I assume] the performance w.r.t. blocksize is:
>>  . a larger blocksize dramatically increases large file performance,
>
> No, it doesn't - maybe a few percent difference when you've got
> multiple GB/s throughput, but it's mostly noise for single spindles.
> Extent based allocation makes block size  pretty much irrelevant for
> sequential write performance...
>
>> but also increases space usage when dealing with small files.
>
> Not significantly enough to matter for modern disks.
>
>> . a smaller blocksize dramatically decreases performance for large
>> files,
>
> See above.
>
>> and somewhat increases performance for small files,
>
> Not really.
>
>> while also
>> slightly increasing space usage with extra inodes(?)
>
> ????

Not actually sure what I intended. My knowledge of file-systems
depends on Google and that statement was only a shot in the dark.
However, you've convinced me not to change the blocksize (keep in mind
I'm running an entire Linux installation from this one XFS partition,
small files included). If the blocksize option is so
performance-independent, why does it even exist?

>> I want to make it clear that I prefer performance over space efficiency.
>
> That's what the defaults are biased towards.

Good to know the defaults are sane - yet another reason not to modify
the blocksize and data allocation group counts.

Thanks,
orbisvicis

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-29  1:21   ` Yclept Nemo
@ 2010-11-29  1:59     ` Dave Chinner
       [not found]       ` <AANLkTikw086Z_66cz_U-EdFQx14TXP6XmiG-KyLN4BLo@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-11-29  1:59 UTC (permalink / raw)
  To: Yclept Nemo; +Cc: xfs

On Mon, Nov 29, 2010 at 01:21:11AM +0000, Yclept Nemo wrote:
> On Mon, Nov 29, 2010 at 12:11 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sun, Nov 28, 2010 at 10:51:04PM +0000, Yclept Nemo wrote:
> >> After 3-4 years of using one XFS partition for every mount point
> >> (/,/usr,/etc,/home,/tmp...) I started noticing a rapid performance
> >> degradation. Subjectively I now feel my XFS partition is 5-10x slower
> >> ... while other partitions (ntfs,ext3) remain the same.
> >
> > Can you run some benchmarks to show this non-subjectively? Aged
> > filesystems will be slower than new filesytsems, and it should be
> > measurable. Also, knowing what you filesystem contains (number of
> > files, used capacity, whether you have run it near ENOSPC for
> > extended periods of time, etc) would help us understand the way the
> > filesytesm has aged as well.
> 
> Certainly, if you are interested I can run either dbench or bonnie++
> tests comparing an XFS partition (with default values from xfsprogs
> 3.1.3) on the new hard-drive to the existing partition on the old. As
> I'm not sure what you're looking for, what command parameters should I
> profile against?
> 
> The XFS partition in question is 39.61GB in size, of which 30.71GB are
> in use (8.90GB free). It contains a typical Arch Linux installation
> with many programs and many personal files. Usage pattern as follows:
> . equal runtime split between (near ENOSPC) and (approximately 10.0GB free)

There's your problem - it's a well known fact that running XFS at
more than 85-90% capacity for extended periods of time causes free
space fragmentation and that results in performance degradation.

> . mostly small files, one or two exceptions
> . often reach ENOSPC through carelessness
> . run xfs_fsr very often

And xfs_fsr is also known to cause free space fragmentation when run
on filesystems with not much space available...

> >> I was considering running "mkfs.xfs -d agcount=32 -i attr=2 -l
> >> version=2,lazy-count=1,size=256m /dev/sda5".
> >>
> >> Yes, I know that in xfs_progs 3.1.3 "-i attr=2 -l
> >> version=2,lazy-count=1" are already default options. However I think I
> >> should tweak the log size, blocksize, and data allocation group counts
> >> beyond the default values and I'm looking for some recommendations or
> >> input.
> >
> > Why do you think you should tweak them?
> 
> To avoid the aging slowdown as well as to increase read/write/metadata
> performance with small files.

According to your description above, tweaking these will not help
you at all.

> >> I assume mkfs.xfs automatically selects optimal values, but I *have*
> >> space to spare for a larger log section... and perhaps my old XFS
> >> partition became sluggish when the log section had filled up, if this
> >> is even possible.
> >
> > Well, you had a very small log (20MB) on the original filesystem,
> > and so as the filesystems ages (e.g. free space fragments), each
> > allocation/free transaction would be larger than on a new filesytsem
> > because of the larger btrees that need to be manipulated. With such
> > a small log, that could be part of the reason for the slowdown you
> > were seeing.  However, without knowing what you filesystem looks
> > like physically, this is only speculation.
> >
> > That being said, the larger log (50MB) that the new filesystem has
> > shouldn't have the same degree of degradation under the same aging
> > characteristics. It's probably not necesary to go larger than ~100MB
> > for partition of 100GB on a single spindle...
> 
> In this case I'll aim for a large log section, probably 256 or 512MB,
> unless it will impede performance.

It can because once you get into a tail-pushing situation then it'll
trigger lots and lots of metadata IO. That's probably not ideal for
a laptop drive. You'd do better to keep the log at around 100MB and
use delayed logging....

> That way there will be no problems
> when I resize the partition to 200GB ... 300GB... up to a maximum of
> 450GB. In fact the manual page of xfs_growfs - which might be outdated
> - warns that log resizing is not implemented, so it would probably be
> auspicious to create an overly-large log section.
> 
> >> Similarly a larger agcount should always give better performance,
> >> right?
> >
> > No.
> >
> >> Some resources claim that agcount should never fall below
> >> eight.
> >
> > If those resources are right, then why would we default to 4 AGs for
> > filesystems on single spindles?
> 
> Obviously you are against modifying the agcount - I won't touch it :)

No, what I'm pointing out is that <some random web reference> is not
a good guide for tuning an XFS filesystem. You need to _understand_
what changing that knob does before you change it. If you don't
understand what it does, then don't change it...

> Not actually sure what I intended. My knowledge of file-systems
> depends on Google and that statement was only a shot in the dark.
> However, you've convinced me not to change the blocksize (keep in mind
> I'm running an entire Linux installation from this one XFS partition,
> small files included).

Sure, I do that too. My workstation has a 220GB root partition that
contains all my kernel trees, build areas, etc. It has agcount=16
because I'm running on a 8c machine and do 8-way parallel builds, a
log of 105MB and I'm using delaylog....

> If the blocksize option is so
> performance-independent, why does it even exist?

Because there are situations where it makes sense to change the
block size. That isn't really a general use root filesystem,
though...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
       [not found]       ` <AANLkTikw086Z_66cz_U-EdFQx14TXP6XmiG-KyLN4BLo@mail.gmail.com>
@ 2010-11-29  3:57         ` Yclept Nemo
  2010-11-29  5:41           ` Stan Hoeppner
  2010-11-29  8:38           ` Michael Monnerie
  0 siblings, 2 replies; 11+ messages in thread
From: Yclept Nemo @ 2010-11-29  3:57 UTC (permalink / raw)
  To: xfs

> If only I had a dollar for every time I forgot to 'cc the mailing list within Gmail:


On Mon, Nov 29, 2010 at 1:59 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Mon, Nov 29, 2010 at 01:21:11AM +0000, Yclept Nemo wrote:
>> On Mon, Nov 29, 2010 at 12:11 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Sun, Nov 28, 2010 at 10:51:04PM +0000, Yclept Nemo wrote:
>> >> After 3-4 years of using one XFS partition for every mount point
>> >> (/,/usr,/etc,/home,/tmp...) I started noticing a rapid performance
>> >> degradation. Subjectively I now feel my XFS partition is 5-10x slower
>> >> ... while other partitions (ntfs,ext3) remain the same.
>> >
>> > Can you run some benchmarks to show this non-subjectively? Aged
>> > filesystems will be slower than new filesytsems, and it should be
>> > measurable. Also, knowing what you filesystem contains (number of
>> > files, used capacity, whether you have run it near ENOSPC for
>> > extended periods of time, etc) would help us understand the way the
>> > filesytesm has aged as well.
>>
>> Certainly, if you are interested I can run either dbench or bonnie++
>> tests comparing an XFS partition (with default values from xfsprogs
>> 3.1.3) on the new hard-drive to the existing partition on the old. As
>> I'm not sure what you're looking for, what command parameters should I
>> profile against?
>>
>> The XFS partition in question is 39.61GB in size, of which 30.71GB are
>> in use (8.90GB free). It contains a typical Arch Linux installation
>> with many programs and many personal files. Usage pattern as follows:
>> . equal runtime split between (near ENOSPC) and (approximately 10.0GB free)
>
> There's your problem - it's a well known fact that running XFS at
> more than 85-90% capacity for extended periods of time causes free
> space fragmentation and that results in performance degradation.
>
>> . mostly small files, one or two exceptions
>> . often reach ENOSPC through carelessness
>> . run xfs_fsr very often
>
> And xfs_fsr is also known to cause free space fragmentation when run
> on filesystems with not much space available...

Pheww... I'm relieved to learn that my performance degradation will be
alleviated with this hard-drive update. Which also means I need no
longer be so obsessive-compulsive when tweaking the second incarnation
of my XFS file-system.

>> >> Similarly a larger agcount should always give better performance,
>> >> right?
>> >
>> > No.
>> >
>> >> Some resources claim that agcount should never fall below
>> >> eight.
>> >
>> > If those resources are right, then why would we default to 4 AGs for
>> > filesystems on single spindles?
>>
>> Obviously you are against modifying the agcount - I won't touch it :)
>
> No, what I'm pointing out is that <some random web reference> is not
> a good guide for tuning an XFS filesystem. You need to _understand_
> what changing that knob does before you change it. If you don't
> understand what it does, then don't change it...

As I understand an XFS system is demarcated into several allocation
groups each containing a superblock as well as private inode and
free-space btrees, and thus increasing the AG count increases
parallelization. I simply assumed the process was CPU-bound, not disk
bound. Though by mentioning spindles, I realized it makes sense to
limit the amount of parallel access to a single hard drive; I've often
noticed XFS come to a crawl when I simultaneously call multiple
intensive IO operations.

>> Not actually sure what I intended. My knowledge of file-systems
>> depends on Google and that statement was only a shot in the dark.
>> However, you've convinced me not to change the blocksize (keep in mind
>> I'm running an entire Linux installation from this one XFS partition,
>> small files included).
>
> Sure, I do that too. My workstation has a 220GB root partition that
> contains all my kernel trees, build areas, etc. It has agcount=16
> because I'm running on a 8c machine and do 8-way parallel builds, a
> log of 105MB and I'm using delaylog....

You mention an eight-core machine (8c?). Since I operate a dual-core
system, would it make sense to increase my AG count slightly, to five
or six?

Hm.. I was going to wait till 2.6.39 but I think I'll enable delayed
logging right now!

>> If the blocksize option is so
>> performance-independent, why does it even exist?
>
> Because there are situations where it makes sense to change the
> block size. That isn't really a general use root filesystem,
> though...

Now for the implementation of transferring the data to new XFS
partition/hard-drive...
I was originally going to use  "rsync -avxAHX" until I stumbled across
this list's thread, "ENOSPC at 90% with plenty of inodes" which
mentioned xfsdump and xfsrestore. I now have three questions:

. since xfsdump and xfsrestore access the base file-system structure,
these tools will be able to copy everything ???:
  - files
  - special files (sockets/fifos)
  - permissions
  - attributes
  - acls (it is in the man page, but I list it here for completion)
  - symlinks
  - hard links
  - extended attributes
  - character/block devices
  - modification times
  - etc... everything: anything rsync could copy and more

. since xfsdump and xfsrestore access the base file-system structure,
will they be able to:
  - update creation-time XFS parameters (from the original
file-system) to adapt to the new XFS file-system in order to benefit
from performance and capability improvements. For example:
     . adapt the log section from version1 to version2
     . modify the agcount
     . update the metadata attributes to version 2
     . enable lazy-count for all files
     . etc
  - reduce fragmentation and free-space fragmentation upon
xfsrestore. (or does xfsrestore simply copy bit-by-bit the old XFS
structure into the new file-system).

  -If not, are there any reasons to nonetheless prefer
xfsdump/xfsrestore over rsync?

If any of this is already mentioned in the xfsdump/restore man pages,
i apologize; I simply don't want to wait till tomorrow to begin the
backup/restore process.

Sincerely,
orbisvicis

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-29  3:57         ` Yclept Nemo
@ 2010-11-29  5:41           ` Stan Hoeppner
  2010-11-30  4:29             ` Dave Chinner
  2010-11-29  8:38           ` Michael Monnerie
  1 sibling, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2010-11-29  5:41 UTC (permalink / raw)
  To: xfs

Yclept Nemo put forth on 11/28/2010 9:57 PM:

> Pheww... I'm relieved to learn that my performance degradation will be
> alleviated with this hard-drive update. Which also means I need no
> longer be so obsessive-compulsive when tweaking the second incarnation
> of my XFS file-system.

You can also alleviate this problem with careful planning.  Have a /boot
and / filesystem that are painfully small, making / just large enough to
hold the OS files, log files, etc.  Make one or more XFS filesystems to
hold your data.  If one of these becomes slow due to hitting the 85%+
mark, you can simply copy all the data to an external device or NFS
mounted directory, delete and then remake the filesystem, preferably
larger if you have free space on the drive.  When you copy all the files
back to the new filesystem, your performance will be restored.  The
catch is you need to make the new on bigger by at least 15-20%, more if
you have the space, to prevent arriving at the same situation soon after
doing all this.

> As I understand an XFS system is demarcated into several allocation
> groups each containing a superblock as well as private inode and
> free-space btrees, and thus increasing the AG count increases
> parallelization. I simply assumed the process was CPU-bound, not disk
> bound. Though by mentioning spindles, I realized it makes sense to
> limit the amount of parallel access to a single hard drive; I've often
> noticed XFS come to a crawl when I simultaneously call multiple
> intensive IO operations.

You didn't notice XFS come to a crawl.  You noticed your single disk
come to a crawl.  Even 15k rpm SAS drives max out at ~300 seeks/sec.  A
5.4k rpm laptop SATA drive will be lucky to sustain 100 seeks/sec.  A
cheap SATA SSD will do multiple thousands of seeks/sec, as will a RAID
array with 8 or more 15k SAS drives and a decent sized write cache, say
256MB or more.

Multiple AGs really shine with large RAID arrays with many spindles.
They are far less relevant WRT performance of a single disk.

> You mention an eight-core machine (8c?). Since I operate a dual-core
> system, would it make sense to increase my AG count slightly, to five
> or six?

Dave didn't mention the disk configuration of his "workstation".  I'm
guessing he's got a local RAID setup with 8-16 drives.  AG count has a
direct relationship to the storage hardware, not the number of CPUs
(cores) in the system.  If you have a 24 core system (2x Magny Cours)
and a single disk, creating an FS with 24 AGs will give you nothing, and
may actually impede performance due to all the extra head seeking across
those 24 AGs.

> Hm.. I was going to wait till 2.6.39 but I think I'll enable delayed
> logging right now!

Delayed logging can definitely help increase write throughput to a
single disk.  It pushes some of the I/O bottleneck into CPU/memory
territory for a short period of time.  Keep in mind that data must
eventually be written to disk, so you are merely delaying the physical
disk bottleneck for a while, as the name implies.  Also note that it
will do absolutely nothing for read performance.

> Now for the implementation of transferring the data to new XFS
> partition/hard-drive...
> I was originally going to use  "rsync -avxAHX" until I stumbled across
> this list's thread, "ENOSPC at 90% with plenty of inodes" which
> mentioned xfsdump and xfsrestore. I now have three questions:

If xfsdump/xfsrestore don't turn out to be a viable solution...

Just create a new big XFS filesystem on the new disk, such as you have
now, with the defaults.  Enter runlevel 2, stop every daemon you can
without blowing things up, and "cp -a" everything over to the new
filesystem.  Modify /etc/lilo or /etc/grub accordingly on the new disk
to use the new disk, and burn an MBR to the new disk.  Reboot the
machine, enter BIOS, set new disk as boot, and boot.  It should be that
simple.  If it doesn't work, change the boot disk in the BIOS, boot the
old disk, and troubleshoot.

This is pretty much exactly what I did when I replaced the drive in my
home MX/Samba/etc server about a year ago, although I had many
partitions instead of one, all EXT2 not XFS, and probably many more
daemons running that your workstation.  I copied each FS separately, and
avoided /proc, which turned out to be a mistake.  There are apparently a
few things in /proc that the kernel doesn't create new on each boot, so
I had to go back and copy those individually-can't recall now exactly
what they were.  Anyway, cp _everything_ over, and ignore any errors,
and you should be ok.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-29  3:57         ` Yclept Nemo
  2010-11-29  5:41           ` Stan Hoeppner
@ 2010-11-29  8:38           ` Michael Monnerie
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Monnerie @ 2010-11-29  8:38 UTC (permalink / raw)
  To: xfs; +Cc: Yclept Nemo


[-- Attachment #1.1: Type: Text/Plain, Size: 592 bytes --]

On Montag, 29. November 2010 Yclept Nemo wrote:
>   -If not, are there any reasons to nonetheless prefer
> xfsdump/xfsrestore over rsync?

Just do an rsync. As this copies data new, XFS can arrange them in order 
anyway. No need for special tools.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531

// ****** Radiointerview zum Thema Spam ******
// http://www.it-podcast.at/archiv.html#podcast-100716
// 
// Haus zu verkaufen: http://zmi.at/langegg/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-29  5:41           ` Stan Hoeppner
@ 2010-11-30  4:29             ` Dave Chinner
  2010-11-30  4:50               ` Stan Hoeppner
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-11-30  4:29 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Sun, Nov 28, 2010 at 11:41:35PM -0600, Stan Hoeppner wrote:
> Yclept Nemo put forth on 11/28/2010 9:57 PM:
> > You mention an eight-core machine (8c?). Since I operate a dual-core
> > system, would it make sense to increase my AG count slightly, to five
> > or six?
> 
> Dave didn't mention the disk configuration of his "workstation".  I'm
> guessing he's got a local RAID setup with 8-16 drives.

2 SSDs in RAID0.

> AG count has a
> direct relationship to the storage hardware, not the number of CPUs
> (cores) in the system.

Actually, I used 16 AGs because it's twice the number of CPU cores
and I want to make sure that CPU parallel workloads (e.g. make -j 8)
don't serialise on AG locks during allocation. IOWs, I laid it out
that way precisely because of the number of CPUs in the system...

And to point out the not-so-obvious, this is the _default layout_
that mkfs.xfs in the debian squeeze installer came up with. IOWs,
mkfs.xfs did exactly what I wanted without me having to tweak
_anything_.

> If you have a 24 core system (2x Magny Cours)
> and a single disk, creating an FS with 24 AGs will give you nothing, and
> may actually impede performance due to all the extra head seeking across
> those 24 AGs.

In that case, you are right. Single spindle SRDs go backwards in
performance pretty quickly once you go over 4 AGs...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-30  4:29             ` Dave Chinner
@ 2010-11-30  4:50               ` Stan Hoeppner
  2010-11-30  7:51                 ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Stan Hoeppner @ 2010-11-30  4:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave Chinner put forth on 11/29/2010 10:29 PM:
> On Sun, Nov 28, 2010 at 11:41:35PM -0600, Stan Hoeppner wrote:
>> Yclept Nemo put forth on 11/28/2010 9:57 PM:
>>> You mention an eight-core machine (8c?). Since I operate a dual-core
>>> system, would it make sense to increase my AG count slightly, to five
>>> or six?
>>
>> Dave didn't mention the disk configuration of his "workstation".  I'm
>> guessing he's got a local RAID setup with 8-16 drives.
> 
> 2 SSDs in RAID0.

>From an IOPs and throughput perspective, very similar to my guess.
Curious, are those Intel, OCZ, or other SSDs?  Which model,
specifically?  Benchmark data?  I ask as all the results I find on the
web for SSDs are from Windows 7 machines. :(  I like to see some Linux
results.

>> AG count has a
>> direct relationship to the storage hardware, not the number of CPUs
>> (cores) in the system.
> 
> Actually, I used 16 AGs because it's twice the number of CPU cores
> and I want to make sure that CPU parallel workloads (e.g. make -j 8)
> don't serialise on AG locks during allocation. IOWs, I laid it out
> that way precisely because of the number of CPUs in the system...

And that makes perfect sense, assuming you have a sufficiently speedy
storage device, which you do.

> And to point out the not-so-obvious, this is the _default layout_
> that mkfs.xfs in the debian squeeze installer came up with. IOWs,
> mkfs.xfs did exactly what I wanted without me having to tweak
> _anything_.

Forgive me for I've not looked at the code.  How exactly does mkfs.xfs
determine the AG count?  If you'd had a single 7.2k SATA drive instead
of 2 RAID0 SSDs, would it have still given you 16 AGs?  If so, I'd say
that's a bug.

>> If you have a 24 core system (2x Magny Cours)
>> and a single disk, creating an FS with 24 AGs will give you nothing, and
>> may actually impede performance due to all the extra head seeking across
>> those 24 AGs.
> 
> In that case, you are right. Single spindle SRDs go backwards in
> performance pretty quickly once you go over 4 AGs...

That was the point I was making originally.  AG count should be balanced
between storage device performance and number of cores, not strictly one
or the other.  True?  How does mkfs.xfs strike that balance?  Or does
it, if using defaults?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-30  4:50               ` Stan Hoeppner
@ 2010-11-30  7:51                 ` Dave Chinner
  2010-12-01  0:47                   ` Stan Hoeppner
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-11-30  7:51 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: xfs

On Mon, Nov 29, 2010 at 10:50:33PM -0600, Stan Hoeppner wrote:
> Dave Chinner put forth on 11/29/2010 10:29 PM:
> > On Sun, Nov 28, 2010 at 11:41:35PM -0600, Stan Hoeppner wrote:
> >> Yclept Nemo put forth on 11/28/2010 9:57 PM:
> >>> You mention an eight-core machine (8c?). Since I operate a dual-core
> >>> system, would it make sense to increase my AG count slightly, to five
> >>> or six?
> >>
> >> Dave didn't mention the disk configuration of his "workstation".  I'm
> >> guessing he's got a local RAID setup with 8-16 drives.
> > 
> > 2 SSDs in RAID0.
> 
> From an IOPs and throughput perspective, very similar to my guess.
> Curious, are those Intel, OCZ, or other SSDs?  Which model,
> specifically?  Benchmark data?  I ask as all the results I find on the
> web for SSDs are from Windows 7 machines. :(  I like to see some Linux
> results.

Cheap as it gets 120GB Sandforce 1200 drives. In RAID0, I'm getting about
450MB/s sequential write, a little more for read. I'm seeing up to
12-14k random 4k writes per drive through XFS. Other than that I
didn't bother with any more benchmarks because it was clearly Fast
Enough.

> > And to point out the not-so-obvious, this is the _default layout_
> > that mkfs.xfs in the debian squeeze installer came up with. IOWs,
> > mkfs.xfs did exactly what I wanted without me having to tweak
> > _anything_.
> 
> Forgive me for I've not looked at the code.  How exactly does mkfs.xfs
> determine the AG count?  If you'd had a single 7.2k SATA drive instead
> of 2 RAID0 SSDs, would it have still given you 16 AGs?  If so, I'd say
> that's a bug.

No, it detected the RAID configuration. 16 AGs is the default for a
RAID device, 4 AGs is used if RAID is not detected.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS: performance
  2010-11-30  7:51                 ` Dave Chinner
@ 2010-12-01  0:47                   ` Stan Hoeppner
  0 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2010-12-01  0:47 UTC (permalink / raw)
  To: xfs

Dave Chinner put forth on 11/30/2010 1:51 AM:

> No, it detected the RAID configuration. 16 AGs is the default for a
> RAID device, 4 AGs is used if RAID is not detected.

So, as a general rule of thumb, one should have as many AGs as the
maximum number of potential writers, assuming the underlying storage
device can handle the I/O?  For instance, on a quad socket 12 core Magny
Cours system with 48 total cores, if one has an application with 48
writing threads, one should have at least 48 AGs?

And if the the default is 4 AGs per physical disk, does this mean one
should have at least 12 drives in the stripe width?  I'm thinking of
SRDs in my example, not SSDs.  And obviously if these are heavy writing
threads, a single 2 GHz core can easily outrun a single disk, so one
would probably want at least 48 SRDs if not 96 in the stripe.  Yes?
Obviously there are potentially many other factors that may come into
play in this equation.  I'm just looking for a general formula one
should use as a baseline template, as 16 AGs doesn't seem to be enough
for any application, core/thread count, SRD RAID configuration.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-12-01  0:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-28 22:51 XFS: performance Yclept Nemo
2010-11-29  0:11 ` Dave Chinner
2010-11-29  1:21   ` Yclept Nemo
2010-11-29  1:59     ` Dave Chinner
     [not found]       ` <AANLkTikw086Z_66cz_U-EdFQx14TXP6XmiG-KyLN4BLo@mail.gmail.com>
2010-11-29  3:57         ` Yclept Nemo
2010-11-29  5:41           ` Stan Hoeppner
2010-11-30  4:29             ` Dave Chinner
2010-11-30  4:50               ` Stan Hoeppner
2010-11-30  7:51                 ` Dave Chinner
2010-12-01  0:47                   ` Stan Hoeppner
2010-11-29  8:38           ` Michael Monnerie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox