public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Question regarding performance on big files.
@ 2010-09-20 17:04 Mathieu AVILA
  2010-09-20 19:48 ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Mathieu AVILA @ 2010-09-20 17:04 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1264 bytes --]

  Hello XFS team,

I have run into trouble with XFS, but excuse me if this question has 
been asked a dozens times.

I'm am filling a very big file on a XFS filesystem on Linux that stands 
on a software RAID 0. Performance are very good until I get 2 "holes" 
during which my write stalls for a few seconds.
Mkfs parameters:
mkxfs.xfs -b size 4096 -s size 4096 -d agcount=2 -i size=2048
The RAID0 is done a 2 SATA disks of 500 GB each.

My test is just running "dd" with 8M blocks:
dd if=/dev/zero of=/DATA/big
(/DATA is the XFS file system)

The system is basically a RHEL5 with a 2.6.18 kernel and XFS packages 
from CentOS.

The problem happens 2 times: one time around 210 GB and the second time 
around 688 GB (hole in performance and response time is bigger the 
second time -- around 20 seconds)

Do you have any clue ? Do my mkfs parameters make sense ? The goal here 
is really to have something that is able to store big files at a 
constant throughput -- the test is done on purpose.

-- 
*Mathieu Avila*
IT & Integration Engineer
mathieu.avila@opencubetech.com

OpenCube Technologies http://www.opencubetech.com
Parc Technologique du Canal, 9 avenue de l'Europe
31520 Ramonville St Agne - FRANCE
Tel. : +33 (0) 561 285 606 - Fax : +33 (0) 561 285 635

[-- Attachment #1.2: Type: text/html, Size: 1963 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question regarding performance on big files.
  2010-09-20 17:04 Question regarding performance on big files Mathieu AVILA
@ 2010-09-20 19:48 ` Stan Hoeppner
  2010-09-22 10:26   ` Mathieu AVILA
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2010-09-20 19:48 UTC (permalink / raw)
  To: xfs

Mathieu AVILA put forth on 9/20/2010 12:04 PM:
>  Hello XFS team,
> 
> I have run into trouble with XFS, but excuse me if this question has
> been asked a dozens times.
> 
> I'm am filling a very big file on a XFS filesystem on Linux that stands
> on a software RAID 0. Performance are very good until I get 2 "holes"
> during which my write stalls for a few seconds.
> Mkfs parameters:
> mkxfs.xfs -b size 4096 -s size 4096 -d agcount=2 -i size=2048
> The RAID0 is done a 2 SATA disks of 500 GB each.

What happens when you make the filesystem using defaults?

mkfs.xfs /dev/[device]

Not sure if it is related to your issue, but your manual agcount setting
seems really low.  agcount greatly affects parallelism.  With a manual
setting of 2, you're dictating serial read/write stream behavior to/from
each drive.  This is not good.

I have a server with a single 500GB SATA drive with two XFS filesystem
partitions for data, each of 100GB, and a 35GB EXT partition for the /
filesystem.  Over half the drive space is unallocated.  Yet each XFS
filesystem has 4 default allocation groups.  If I were to create two
more 100GB filesystems, I'd end up with 16 AGs for 400GB worth of XFS
filesystems on a single 500GB drive.

meta-data=/dev/sda6    isize=256    agcount=4, agsize=6103694 blks
         =             sectsz=512   attr=2
data     =             bsize=4096   blocks=24414775, imaxpct=25
         =             sunit=0      swidth=0 blks
naming   =version 2    bsize=4096
log      =internal     bsize=4096   blocks=11921, version=2
         =             sectsz=512   sunit=0 blks, lazy-count=0
realtime =none         extsz=4096   blocks=0, rtextents=0

My suggestion would be to create the filesystem using default values and
see what you get.  2.6.18 is rather old, and I don't know if XFS picks
up the mdraid config and uses that info accordingly.  Newer versions of
XFS do this automatically and correctly, so you don't need to manually
specify anything with mkfs.xfs.

If default mkfs values still yield issues/problems, remake the
filesystem specifying '-d sw=2' and retest.

You specified '-b size=4096'.  This is the default for block size so
there's no need to specify it.

You specified '-s size=4096'.  This needs to match the sector size of
the underlying physical disk, which is 512 bytes in your case.  This may
be part of your problem as well.

You specified '-d agcount=2'.  From man mkfs.xfs:

"The data section of the filesystem is divided into _value_ allocation
groups (default value is scaled automatically based on the underlying
device size)."

My guess is that mkfs.xfs with no manual agcount forced would yield
something like 32-40 allocations groups on your RAID0 1TB XFS
filesystem.  Theoretically, this should boost your performance 16-20
times over your current agcount setting of 2 allocation groups.  In
reality the boost won't be nearly that great, but your performance
should be greatly improved nonetheless.

-- 
Stan


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question regarding performance on big files.
  2010-09-20 19:48 ` Stan Hoeppner
@ 2010-09-22 10:26   ` Mathieu AVILA
  2010-09-22 20:41     ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Mathieu AVILA @ 2010-09-22 10:26 UTC (permalink / raw)
  To: xfs

  Hello,

Thank you for your quick answer.

I have run my test again with default parameters for mkfs.
I still have this issue. For 20 seconds, the writes are either stalled, 
or very slow.
I have run "vmstat" at the same time than "dd", and it appears that the 
block device continues to receive write requests, while "dd" is blocked 
in the kernel.
With blktrace, I can see that during this period of time, the block 
receives a lot of small write requests throughout the volume ranging 
from the start till the point where the file has stopped writing. During 
the other periods of time, the volume is written normally, starting at 
offset 0 and filling the disk continuously.

Could this be an effect of tree rebalancing for extents management (both 
inode of big file and free space trees) ? Can it be a hardware problem ? 
Have you ever seen that issue before ?

--
Mathieu Avila


Le 20/09/2010 21:48, Stan Hoeppner a écrit :
> Mathieu AVILA put forth on 9/20/2010 12:04 PM:
>>   Hello XFS team,
>>
>> I have run into trouble with XFS, but excuse me if this question has
>> been asked a dozens times.
>>
>> I'm am filling a very big file on a XFS filesystem on Linux that stands
>> on a software RAID 0. Performance are very good until I get 2 "holes"
>> during which my write stalls for a few seconds.
>> Mkfs parameters:
>> mkxfs.xfs -b size 4096 -s size 4096 -d agcount=2 -i size=2048
>> The RAID0 is done a 2 SATA disks of 500 GB each.
> What happens when you make the filesystem using defaults?
>
> mkfs.xfs /dev/[device]
>
> Not sure if it is related to your issue, but your manual agcount setting
> seems really low.  agcount greatly affects parallelism.  With a manual
> setting of 2, you're dictating serial read/write stream behavior to/from
> each drive.  This is not good.
>
> I have a server with a single 500GB SATA drive with two XFS filesystem
> partitions for data, each of 100GB, and a 35GB EXT partition for the /
> filesystem.  Over half the drive space is unallocated.  Yet each XFS
> filesystem has 4 default allocation groups.  If I were to create two
> more 100GB filesystems, I'd end up with 16 AGs for 400GB worth of XFS
> filesystems on a single 500GB drive.
>
> meta-data=/dev/sda6    isize=256    agcount=4, agsize=6103694 blks
>           =             sectsz=512   attr=2
> data     =             bsize=4096   blocks=24414775, imaxpct=25
>           =             sunit=0      swidth=0 blks
> naming   =version 2    bsize=4096
> log      =internal     bsize=4096   blocks=11921, version=2
>           =             sectsz=512   sunit=0 blks, lazy-count=0
> realtime =none         extsz=4096   blocks=0, rtextents=0
>
> My suggestion would be to create the filesystem using default values and
> see what you get.  2.6.18 is rather old, and I don't know if XFS picks
> up the mdraid config and uses that info accordingly.  Newer versions of
> XFS do this automatically and correctly, so you don't need to manually
> specify anything with mkfs.xfs.
>
> If default mkfs values still yield issues/problems, remake the
> filesystem specifying '-d sw=2' and retest.
>
> You specified '-b size=4096'.  This is the default for block size so
> there's no need to specify it.
>
> You specified '-s size=4096'.  This needs to match the sector size of
> the underlying physical disk, which is 512 bytes in your case.  This may
> be part of your problem as well.
>
> You specified '-d agcount=2'.  From man mkfs.xfs:
>
> "The data section of the filesystem is divided into _value_ allocation
> groups (default value is scaled automatically based on the underlying
> device size)."
>
> My guess is that mkfs.xfs with no manual agcount forced would yield
> something like 32-40 allocations groups on your RAID0 1TB XFS
> filesystem.  Theoretically, this should boost your performance 16-20
> times over your current agcount setting of 2 allocation groups.  In
> reality the boost won't be nearly that great, but your performance
> should be greatly improved nonetheless.
>


-- 
*Mathieu Avila*
IT & Integration Engineer
mathieu.avila@opencubetech.com

OpenCube Technologies http://www.opencubetech.com
Parc Technologique du Canal, 9 avenue de l'Europe
31520 Ramonville St Agne - FRANCE
Tel. : +33 (0) 561 285 606 - Fax : +33 (0) 561 285 635

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question regarding performance on big files.
  2010-09-22 10:26   ` Mathieu AVILA
@ 2010-09-22 20:41     ` Stan Hoeppner
  2010-09-23  8:55       ` Mathieu AVILA
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hoeppner @ 2010-09-22 20:41 UTC (permalink / raw)
  To: xfs

Mathieu AVILA put forth on 9/22/2010 5:26 AM:

> I have run my test again with default parameters for mkfs.
> I still have this issue. For 20 seconds, the writes are either stalled,
> or very slow.
> I have run "vmstat" at the same time than "dd", and it appears that the
> block device continues to receive write requests, while "dd" is blocked
> in the kernel.
> With blktrace, I can see that during this period of time, the block
> receives a lot of small write requests throughout the volume ranging
> from the start till the point where the file has stopped writing. During
> the other periods of time, the volume is written normally, starting at
> offset 0 and filling the disk continuously.

What happens with "dd if=/dev/zero of=/DATA/big oflag=direct"?  You said
the copy is hanging in the kernel.  Maybe a buffer cache issue?

What fstab mount options are you using for this filesystem?

> Could this be an effect of tree rebalancing for extents management (both
> inode of big file and free space trees) ? Can it be a hardware problem ?
> Have you ever seen that issue before ?

WRT tree rebalancing, that's beyond my knowledge level and someone else
will need to jump into this thread.  If it's a hardware problem you
should be seeing something in dmesg or the kernel log, or both.  If
you're not seeing controller or device errors it's probably not a
hardware problem.  Have you tried this same test with only one of those
two 500GB drives, no mdraid stripe?  That would eliminate any possible
issues with your mdraid implementation.  Speaking of which, could you
please share your mdraid parameters for this stripe set?  That could be
a factor as well.

-- 
Stan


> -- 
> Mathieu Avila
> 
> 
> Le 20/09/2010 21:48, Stan Hoeppner a écrit :
>> Mathieu AVILA put forth on 9/20/2010 12:04 PM:
>>>   Hello XFS team,
>>>
>>> I have run into trouble with XFS, but excuse me if this question has
>>> been asked a dozens times.
>>>
>>> I'm am filling a very big file on a XFS filesystem on Linux that stands
>>> on a software RAID 0. Performance are very good until I get 2 "holes"
>>> during which my write stalls for a few seconds.
>>> Mkfs parameters:
>>> mkxfs.xfs -b size 4096 -s size 4096 -d agcount=2 -i size=2048
>>> The RAID0 is done a 2 SATA disks of 500 GB each.
>> What happens when you make the filesystem using defaults?
>>
>> mkfs.xfs /dev/[device]
>>
>> Not sure if it is related to your issue, but your manual agcount setting
>> seems really low.  agcount greatly affects parallelism.  With a manual
>> setting of 2, you're dictating serial read/write stream behavior to/from
>> each drive.  This is not good.
>>
>> I have a server with a single 500GB SATA drive with two XFS filesystem
>> partitions for data, each of 100GB, and a 35GB EXT partition for the /
>> filesystem.  Over half the drive space is unallocated.  Yet each XFS
>> filesystem has 4 default allocation groups.  If I were to create two
>> more 100GB filesystems, I'd end up with 16 AGs for 400GB worth of XFS
>> filesystems on a single 500GB drive.
>>
>> meta-data=/dev/sda6    isize=256    agcount=4, agsize=6103694 blks
>>           =             sectsz=512   attr=2
>> data     =             bsize=4096   blocks=24414775, imaxpct=25
>>           =             sunit=0      swidth=0 blks
>> naming   =version 2    bsize=4096
>> log      =internal     bsize=4096   blocks=11921, version=2
>>           =             sectsz=512   sunit=0 blks, lazy-count=0
>> realtime =none         extsz=4096   blocks=0, rtextents=0
>>
>> My suggestion would be to create the filesystem using default values and
>> see what you get.  2.6.18 is rather old, and I don't know if XFS picks
>> up the mdraid config and uses that info accordingly.  Newer versions of
>> XFS do this automatically and correctly, so you don't need to manually
>> specify anything with mkfs.xfs.
>>
>> If default mkfs values still yield issues/problems, remake the
>> filesystem specifying '-d sw=2' and retest.
>>
>> You specified '-b size=4096'.  This is the default for block size so
>> there's no need to specify it.
>>
>> You specified '-s size=4096'.  This needs to match the sector size of
>> the underlying physical disk, which is 512 bytes in your case.  This may
>> be part of your problem as well.
>>
>> You specified '-d agcount=2'.  From man mkfs.xfs:
>>
>> "The data section of the filesystem is divided into _value_ allocation
>> groups (default value is scaled automatically based on the underlying
>> device size)."
>>
>> My guess is that mkfs.xfs with no manual agcount forced would yield
>> something like 32-40 allocations groups on your RAID0 1TB XFS
>> filesystem.  Theoretically, this should boost your performance 16-20
>> times over your current agcount setting of 2 allocation groups.  In
>> reality the boost won't be nearly that great, but your performance
>> should be greatly improved nonetheless.
>>
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question regarding performance on big files.
  2010-09-22 20:41     ` Stan Hoeppner
@ 2010-09-23  8:55       ` Mathieu AVILA
  2010-09-23 22:03         ` Stan Hoeppner
  0 siblings, 1 reply; 6+ messages in thread
From: Mathieu AVILA @ 2010-09-23  8:55 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 3721 bytes --]

  Things are going to be solved.
I share information here for others that would run into the same troubles.

1/ Changing the BIOS back to an older version (AMI 1.1 instead of AMI 
2.0) masked the issue (I/O management somewhere in the controller ?) . 
But this is not satisfying, as an older BIOS may not handle correctly my 
hardware. An older version may crash the box due to hardware/software 
incompatibility.
However, no warning/error message from the kernel: from its POV, 
everything is fine. So I switched back to the recent version.

2/ I had set very hard values for page cache:
     vm.dirty_ratio = 3
     vm.dirty_background_ratio = 0
In my case, on a 6GB server, this lets 184 MB to the page cache. This is 
really low, but it's done willingly to avoid caching too much and have 
the kernel start flushing too much. The counter-part is that when my 
filesystem needs to flush a lot a meta-data pages, then the page cache 
is filled and the whole application is frozen, waiting for those I/Os to 
be completed.
With those parameters:
     vm.dirty_ratio = 20
     vm.dirty_background_ratio = 5
The small writes are amortized in the stream of data writes from the 
application, and the application is not frozen.
(so you were right: there was a page cache issue)

The question stands: my does XFS generates such a bunch of small I/O 
writes throughout the disk at around 688GB ?

My fstab mount options are classical ones:
"defaults,nobarrier,noatime,nodiratime"

Maybe the software RAID 0 has helped triggering the problem, too: I 
don't know if writes on a RAID can generate more I/O than on direct 
disk. I guess so (I/O fragmentation), but that's only a guess.

--
Mathieu Avila


Le 22/09/2010 22:41, Stan Hoeppner a écrit :
> Mathieu AVILA put forth on 9/22/2010 5:26 AM:
>
>> I have run my test again with default parameters for mkfs.
>> I still have this issue. For 20 seconds, the writes are either stalled,
>> or very slow.
>> I have run "vmstat" at the same time than "dd", and it appears that the
>> block device continues to receive write requests, while "dd" is blocked
>> in the kernel.
>> With blktrace, I can see that during this period of time, the block
>> receives a lot of small write requests throughout the volume ranging
>> from the start till the point where the file has stopped writing. During
>> the other periods of time, the volume is written normally, starting at
>> offset 0 and filling the disk continuously.
> What happens with "dd if=/dev/zero of=/DATA/big oflag=direct"?  You said
> the copy is hanging in the kernel.  Maybe a buffer cache issue?
>
> What fstab mount options are you using for this filesystem?
>
>> Could this be an effect of tree rebalancing for extents management (both
>> inode of big file and free space trees) ? Can it be a hardware problem ?
>> Have you ever seen that issue before ?
> WRT tree rebalancing, that's beyond my knowledge level and someone else
> will need to jump into this thread.  If it's a hardware problem you
> should be seeing something in dmesg or the kernel log, or both.  If
> you're not seeing controller or device errors it's probably not a
> hardware problem.  Have you tried this same test with only one of those
> two 500GB drives, no mdraid stripe?  That would eliminate any possible
> issues with your mdraid implementation.  Speaking of which, could you
> please share your mdraid parameters for this stripe set?  That could be
> a factor as well.
>


-- 
*Mathieu Avila*
IT & Integration Engineer
mathieu.avila@opencubetech.com

OpenCube Technologies http://www.opencubetech.com
Parc Technologique du Canal, 9 avenue de l'Europe
31520 Ramonville St Agne - FRANCE
Tel. : +33 (0) 561 285 606 - Fax : +33 (0) 561 285 635

[-- Attachment #1.2: Type: text/html, Size: 4866 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Question regarding performance on big files.
  2010-09-23  8:55       ` Mathieu AVILA
@ 2010-09-23 22:03         ` Stan Hoeppner
  0 siblings, 0 replies; 6+ messages in thread
From: Stan Hoeppner @ 2010-09-23 22:03 UTC (permalink / raw)
  To: xfs

Mathieu AVILA put forth on 9/23/2010 3:55 AM:

> The small writes are amortized in the stream of data writes from the
> application, and the application is not frozen.
> (so you were right: there was a page cache issue)

Given what you've described about the streaming write behavior of your
application, I'd suggest you rewrite it and use O_DIRECT writes to
bypass the page cache completely.  You may also want to look into using
the XFS realtime subvolume feature.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-09-23 22:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 17:04 Question regarding performance on big files Mathieu AVILA
2010-09-20 19:48 ` Stan Hoeppner
2010-09-22 10:26   ` Mathieu AVILA
2010-09-22 20:41     ` Stan Hoeppner
2010-09-23  8:55       ` Mathieu AVILA
2010-09-23 22:03         ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox