public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Random write result differences between RAID device and XFS
@ 2016-01-29 10:53 Christian Affolter
  2016-01-29 22:25 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2016-01-29 10:53 UTC (permalink / raw)
  To: xfs

Hi everyone,

I'm trying to understand the differences of some bandwidth and IOPs test
results I see while running a random-write full-stripe-width aligned fio
test (using libaio with direct IO) on a hardware RAID 6 raw device
versus on the same device with the XFS file system on top of it.

On the raw device I get:
write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec

With XFS on top of it:
write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec


The hardware RAID 6 volume consists out of 5 HDDs (3 data disks), the
stripe unit size is 1 MiB and full stripe width is 3 MiB.

XFS was initialized and mounted with the following commands:

mkfs.xfs -d su=1024k,sw=3 -L LV-TEST-02 /dev/sdd
mount -o inode64,noatime -L LV-TEST-02 /mnt/lv-test-02

mkfs.xfs version 3.2.2

xfs_info /mnt/lv-test-02
meta-data=/dev/sdd               isize=256    agcount=16, agsize=819200 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=13106688, imaxpct=25
         =                       sunit=256    swidth=768 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6399, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0



The RAID controller does not export optimal_io_size.


Controller details:
Product Name = AVAGO 3108 MegaRAID
FW Package Build = 24.9.0-0022
BIOS Version = 6.25.03.0_4.17.08.00_0x060E0300
FW Version = 4.290.00-4536
Driver Name = megaraid_sas
Driver Version = 06.808.16.00-rc1


Virtual drive:

--------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache sCC      Size Name
--------------------------------------------------------------------
1/2   RAID6 Optl  RW     Yes      RWBD  -   49.998 GB vd-hdd-test-01
--------------------------------------------------------------------

R=Read Ahead
WB=WriteBack (with a battery backup)
D=Direct IO

The physical disk cache is disabled.


Disks:
5x HGST HUH728080AL5200 (Firmware Revision = A515)


Kernel:
4.4.0-2.el7.elrepo.x86_64


fio command and output for RAID raw device:

fio --filename=/dev/sdd \
    --direct=1 \
    --rw=randwrite \
    --ioengine=libaio \
    --iodepth=16 \
    --numjobs=1 \
    --runtime=60 \
    --exec_prerun="/opt/MegaRAID/storcli/storcli64 /c0 flushcache" \
    --name=direct-raid-hdd-random-write-full-stripe-aligned-3072k \
    --bs=3072k

direct-raid-hdd-random-write-full-stripe-aligned-3072k: (g=0):
rw=randwrite, bs=3M-3M/3M-3M/3M-3M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
direct-raid-hdd-random-write-full-stripe-aligned-3072k : Saving output
of prerun in
direct-raid-hdd-random-write-full-stripe-aligned-3072k.prerun.txt
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/375.0MB/0KB /s] [0/125/0 iops]
[eta 00m:00s]
direct-raid-hdd-random-write-full-stripe-aligned-3072k: (groupid=0,
jobs=1): err= 0: pid=1847: Fri Jan 29 11:47:17 2016
  write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec
    slat (usec): min=250, max=91308, avg=7250.27, stdev=11767.73
    clat (msec): min=8, max=223, avg=108.89, stdev=32.22
     lat (msec): min=8, max=224, avg=116.14, stdev=32.19
    clat percentiles (msec):
     |  1.00th=[    9],  5.00th=[   43], 10.00th=[   78], 20.00th=[   91],
     | 30.00th=[   99], 40.00th=[  106], 50.00th=[  113], 60.00th=[  119],
     | 70.00th=[  126], 80.00th=[  133], 90.00th=[  145], 95.00th=[  153],
     | 99.00th=[  169], 99.50th=[  176], 99.90th=[  198], 99.95th=[  202],
     | 99.99th=[  225]
    bw (KB  /s): min=348681, max=2599384, per=100.00%, avg=423757.22,
stdev=204979.51
    lat (msec) : 10=4.22%, 20=0.76%, 50=0.19%, 100=25.86%, 250=68.97%
  cpu          : usr=2.49%, sys=3.57%, ctx=2959, majf=0, minf=1642
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.8%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=8276/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=24828MB, aggrb=423131KB/s, minb=423131KB/s, maxb=423131KB/s,
mint=60085msec, maxt=60085msec

Disk stats (read/write):
  sdd: ios=59/98996, merge=0/0, ticks=150/8434406, in_queue=8443626,
util=100.00%



fio command and output for XFS:

fio --directory=/mnt/lv-test-02 \
    --filename=test.fio  \
    --size=30g \
    --direct=1 \
    --rw=randwrite \
    --ioengine=libaio \
    --iodepth=16 \
    --numjobs=1 \
    --runtime=60 \
    --exec_prerun="/opt/MegaRAID/storcli/storcli64 /c0 flushcache" \
    --name=xfs-ĥdd-random-write-full-stripe-aligned-3072k \
    --bs=3072k


xfs-ĥdd-random-write-full-stripe-aligned-3072k: (g=0): rw=randwrite,
bs=3M-3M/3M-3M/3M-3M, ioengine=libaio, iodepth=16
fio-2.2.8
Starting 1 process
xfs-ĥdd-random-write-full-stripe-aligned-3072k: Laying out IO file(s) (1
file(s) / 30720MB)
xfs-ĥdd-random-write-full-stripe-aligned-3072k : Saving output of prerun
in xfs-ĥdd-random-write-full-stripe-aligned-3072k.prerun.txt
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/186.0MB/0KB /s] [0/62/0 iops]
[eta 00m:00s]
xfs-ĥdd-random-write-full-stripe-aligned-3072k: (groupid=0, jobs=1):
err= 0: pid=1899: Fri Jan 29 11:50:21 2016
  write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec
    slat (usec): min=231, max=133647, avg=12279.35, stdev=20234.23
    clat (msec): min=4, max=1987, avg=184.75, stdev=81.74
     lat (msec): min=5, max=1987, avg=197.03, stdev=83.84
    clat percentiles (msec):
     |  1.00th=[    7],  5.00th=[    8], 10.00th=[  110], 20.00th=[  143],
     | 30.00th=[  161], 40.00th=[  174], 50.00th=[  188], 60.00th=[  202],
     | 70.00th=[  217], 80.00th=[  237], 90.00th=[  269], 95.00th=[  293],
     | 99.00th=[  363], 99.50th=[  416], 99.90th=[  742], 99.95th=[  922],
     | 99.99th=[ 1991]
    bw (KB  /s): min=130620, max=2460047, per=100.00%, avg=250120.43,
stdev=212307.87
    lat (msec) : 10=8.04%, 100=0.88%, 250=76.44%, 500=14.31%, 750=0.25%
    lat (msec) : 1000=0.04%, 2000=0.04%
  cpu          : usr=1.10%, sys=2.30%, ctx=1891, majf=0, minf=1096
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=99.7%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=4886/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=14658MB, aggrb=249406KB/s, minb=249406KB/s, maxb=249406KB/s,
mint=60182msec, maxt=60182msec

Disk stats (read/write):
  sdd: ios=0/58627, merge=0/12, ticks=0/8552722, in_queue=8559550,
util=99.84%


Many thanks in advance
Chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Random write result differences between RAID device and XFS
  2016-01-29 10:53 Random write result differences between RAID device and XFS Christian Affolter
@ 2016-01-29 22:25 ` Dave Chinner
  2016-01-30 10:43   ` Christian Affolter
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-01-29 22:25 UTC (permalink / raw)
  To: Christian Affolter; +Cc: xfs

On Fri, Jan 29, 2016 at 11:53:35AM +0100, Christian Affolter wrote:
> Hi everyone,
> 
> I'm trying to understand the differences of some bandwidth and IOPs test
> results I see while running a random-write full-stripe-width aligned fio
> test (using libaio with direct IO) on a hardware RAID 6 raw device
> versus on the same device with the XFS file system on top of it.
> 
> On the raw device I get:
> write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec
> 
> With XFS on top of it:
> write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec

Now repeat with a file that is contiguously allocated before you
start. And also perhaps with the "swalloc" mount option.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Random write result differences between RAID device and XFS
  2016-01-29 22:25 ` Dave Chinner
@ 2016-01-30 10:43   ` Christian Affolter
  2016-02-01  5:46     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Affolter @ 2016-01-30 10:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hi Dave,

On 29.01.2016 23:25, Dave Chinner wrote:
> On Fri, Jan 29, 2016 at 11:53:35AM +0100, Christian Affolter wrote:
>> Hi everyone,
>>
>> I'm trying to understand the differences of some bandwidth and IOPs test
>> results I see while running a random-write full-stripe-width aligned fio
>> test (using libaio with direct IO) on a hardware RAID 6 raw device
>> versus on the same device with the XFS file system on top of it.
>>
>> On the raw device I get:
>> write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec
>>
>> With XFS on top of it:
>> write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec
> 
> Now repeat with a file that is contiguously allocated before you
> start. And also perhaps with the "swalloc" mount option.

Wow, thanks! After specifying --fallocate=none (instead of the default
fallocate=posix), bandwidth and iops increases and are even higher than
on the raw device:

write: io=30720MB, bw=599232KB/s, iops=195, runt= 52496msec

I'm eager to learn what's going on behind the scenes, can you give a
short explanation?

Btw. mounting the volume with "swalloc" didn't make any change.


Thanks a lot!
Chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Random write result differences between RAID device and XFS
  2016-01-30 10:43   ` Christian Affolter
@ 2016-02-01  5:46     ` Dave Chinner
  2016-02-01  8:59       ` Christian Affolter
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2016-02-01  5:46 UTC (permalink / raw)
  To: Christian Affolter; +Cc: xfs

On Sat, Jan 30, 2016 at 11:43:56AM +0100, Christian Affolter wrote:
> Hi Dave,
> 
> On 29.01.2016 23:25, Dave Chinner wrote:
> > On Fri, Jan 29, 2016 at 11:53:35AM +0100, Christian Affolter wrote:
> >> Hi everyone,
> >>
> >> I'm trying to understand the differences of some bandwidth and IOPs test
> >> results I see while running a random-write full-stripe-width aligned fio
> >> test (using libaio with direct IO) on a hardware RAID 6 raw device
> >> versus on the same device with the XFS file system on top of it.
> >>
> >> On the raw device I get:
> >> write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec
> >>
> >> With XFS on top of it:
> >> write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec
> > 
> > Now repeat with a file that is contiguously allocated before you
> > start. And also perhaps with the "swalloc" mount option.
> 
> Wow, thanks! After specifying --fallocate=none (instead of the default
> fallocate=posix), bandwidth and iops increases and are even higher than
> on the raw device:
> 
> write: io=30720MB, bw=599232KB/s, iops=195, runt= 52496msec
> 
> I'm eager to learn what's going on behind the scenes, can you give a
> short explanation?

Usually when concurrent direct IO writes are slower than the raw
device it's because something is causing IO submission
serialisation.  Usually that's to do with writes that extend the
file because that can require the inode to be locked exclusively.
Whatever behaviour the fio configuration change modifed, it removed
the IO submission serialisation and so it's now running at full disk
speed.

As to why XFS is faster than the raw block device, the XFS file
is only 30GB, so the random writes are only seeking a short
distance compared to the block device test which is seeking across
the whole device.

> Btw. mounting the volume with "swalloc" didn't make any change.

Which means there is no performance differential between stripe unit
and stripe width aligned writes in this test on your hardware.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Random write result differences between RAID device and XFS
  2016-02-01  5:46     ` Dave Chinner
@ 2016-02-01  8:59       ` Christian Affolter
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Affolter @ 2016-02-01  8:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hello Dave,

On 01.02.2016 06:46, Dave Chinner wrote:
> On Sat, Jan 30, 2016 at 11:43:56AM +0100, Christian Affolter wrote:
>> Hi Dave,
>>
>> On 29.01.2016 23:25, Dave Chinner wrote:
>>> On Fri, Jan 29, 2016 at 11:53:35AM +0100, Christian Affolter wrote:
>>>> Hi everyone,
>>>>
>>>> I'm trying to understand the differences of some bandwidth and IOPs test
>>>> results I see while running a random-write full-stripe-width aligned fio
>>>> test (using libaio with direct IO) on a hardware RAID 6 raw device
>>>> versus on the same device with the XFS file system on top of it.
>>>>
>>>> On the raw device I get:
>>>> write: io=24828MB, bw=423132KB/s, iops=137, runt= 60085msec
>>>>
>>>> With XFS on top of it:
>>>> write: io=14658MB, bw=249407KB/s, iops=81, runt= 60182msec
>>>
>>> Now repeat with a file that is contiguously allocated before you
>>> start. And also perhaps with the "swalloc" mount option.
>>
>> Wow, thanks! After specifying --fallocate=none (instead of the default
>> fallocate=posix), bandwidth and iops increases and are even higher than
>> on the raw device:
>>
>> write: io=30720MB, bw=599232KB/s, iops=195, runt= 52496msec
>>
>> I'm eager to learn what's going on behind the scenes, can you give a
>> short explanation?
> 
> Usually when concurrent direct IO writes are slower than the raw
> device it's because something is causing IO submission
> serialisation.  Usually that's to do with writes that extend the
> file because that can require the inode to be locked exclusively.
> Whatever behaviour the fio configuration change modifed, it removed
> the IO submission serialisation and so it's now running at full disk
> speed.
> 
> As to why XFS is faster than the raw block device, the XFS file
> is only 30GB, so the random writes are only seeking a short
> distance compared to the block device test which is seeking across
> the whole device.
> 
>> Btw. mounting the volume with "swalloc" didn't make any change.
> 
> Which means there is no performance differential between stripe unit
> and stripe width aligned writes in this test on your hardware.

Thank you so much for the detailed explanation and taking the time to help.


Best,
Chris

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-01  8:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-29 10:53 Random write result differences between RAID device and XFS Christian Affolter
2016-01-29 22:25 ` Dave Chinner
2016-01-30 10:43   ` Christian Affolter
2016-02-01  5:46     ` Dave Chinner
2016-02-01  8:59       ` Christian Affolter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox